Performance Data for Intel® AI Data Center Products

Find the latest AI benchmark performance data for Intel Data Center products, including detailed hardware and software configurations.

Pretrained models, sample scripts, best practices, and tutorials

Measurements were taken using:

5th Generation Intel® Xeon® Scalable Processors

Intel® Xeon® Platinum 8592+ Processor (64 Cores)

Inference

Framework Version	Model	Usage	Precision	Throughput	Perf/Watt	Latency(ms)	Batch size
Intel PyTorch 2.1 DeepSpeed	GPT-J 6B Token size 1024/128	text-generation, Beam Search, Width=4	int8			35	1
Intel PyTorch 2.1 DeepSpeed	GPT-J 6B Token size 1024/128	text-generation, Beam Search, Width=4	int8	173 tokens/s		92.5	8
Intel PyTorch 2.1 DeepSpeed	GPT-J 6B Token size 1024/128	text-generation, Beam Search, Width=4	bf16			52.5	1
Intel PyTorch 2.1 DeepSpeed	GPT-J 6B Token size 1024/128	text-generation, Beam Search, Width=4	bf16			98.5	8
Intel PyTorch 2.2 MLPerf v4.0	GPT-J (MLPerf v4.0, offline, 99.0% acc)	CNN-DailyMail News Text Summarization (input 13,368)	int4	3.61 samp/s			8
Intel PyTorch 2.1 DeepSpeed	LLaMA2-7B Token size 1024/128	text-generation, Beam Search, Width=4	int8			41.5	1
Intel PyTorch 2.1 DeepSpeed	LLaMA2-7B Token size 1024/128	text-generation, Beam Search, Width=4	int8	149.5 tokens/s		107	8
Intel PyTorch 2.1 DeepSpeed	LLaMA2-7B Token size 1024/128	text-generation, Beam Search, Width=4	bf16			59.5	1
Intel PyTorch 2.1 DeepSpeed	LLaMA2-7B Token size 1024/128	text-generation, Beam Search, Width=4	bf16	142.2 tokens /s		112.5	8
OpenVINO 2023.2	LLaMA2-7b Token size 32/512	GenAI_chat	Int4	11.3 tokens/s		88.44	1
OpenVINO 2023.2	LLaMA2-7b Token size 32/512	GenAI_chat	int8	13.5 tokens/s		73.74	1
OpenVINO 2023.2	LLaMA2-7b Token size 32/512	GenAI_chat	fp32	11.3 tokens/s		88.39	1
OpenVINO 2023.2	LLaMA2-7b Token size 80/512	GenAI_chat	Int4	11.4 tokens/s		87.17	1
OpenVINO 2023.2	LLaMA2-7b Token size 80/512	GenAI_chat	int8	13.6 tokens/s		73.09	1
OpenVINO 2023.2	LLaMA2-7b Token size 80/512	GenAI_chat	fp32	11.2 tokens/s		89.00	1
OpenVINO 2023.2	LLaMA2-7b Token size 142/512	GenAI_chat	Int4	11.5 tokens/s		86.63	1
OpenVINO 2023.2	LLaMA2-7b Token size 142/512	GenAI_chat	int8	13.3 tokens/s		75.15	1
OpenVINO 2023.2	LLaMA2-7b Token size 142/512	GenAI_chat	fp32	11.1 tokens/s		89.73	1
OpenVINO 2023.2	Stable Diffusion 2.1, 20 Steps, 64 Prompts	GenAI_text_image	int8	0.24 img/s		4,160	1
MLPerf Inference v4.0	Stable Diffusion XL (offline)	Image Generation	bf16	0.19 samp/s			8
MLPerf Inference v4.0	ResNet50 v1.5 (offline)	Image Recognition	int8	25,289.6 samp/s			256
Intel PyTorch 2.1	ResNet50 v1.5	Image Recognition	Int8	12,862.56 img/s	13.23		1
Intel PyTorch 2.1	ResNet50 v1.5	Image Recognition	Int8	19,386.47 img/s	19.21		64
Intel PyTorch 2.1	ResNet50 v1.5	Image Recognition	bf16	8,211.8 img/s	8.13		1
Intel PyTorch 2.1	ResNet50 v1.5	Image Recognition	bf16	10,187.87 img/s	10.82		64
Intel PyTorch 2.1	ResNet50 v1.5	Image Recognition	fp32	1,773.68 img/s	1.74		1
Intel PyTorch 2.1	ResNet50 v1.5	Image Recognition	fp32	1,703.77 img/s	1.57		64
Intel PyTorch 2.1	ResNet50 v1.5	Image Recognition	bf32	2,431.26 img/s	2.40		1
Intel PyTorch 2.1	ResNet50 v1.5	Image Recognition	bf32	2,686.97 img/s	2.67		64
Intel TensorFlow 2.14	ResNet50 v1.5	Image Recognition	Int8	9,726.18 img/s	9.67		1
Intel TensorFlow 2.14	ResNet50 v1.5	Image Recognition	Int8	16,036.8 img/s	17.01		32
Intel TensorFlow 2.14	ResNet50 v1.5	Image Recognition	bf16	6,782.09 img/s	7.04		1
Intel TensorFlow 2.14	ResNet50 v1.5	Image Recognition	bf16	9,312.72 img/s	9.40		32
Intel TensorFlow 2.14	ResNet50 v1.5	Image Recognition	fp32	1,560.99 img/s	1.45		1
Intel TensorFlow 2.14	ResNet50 v1.5	Image Recognition	fp32	1,663.44 img/s	1.57		32
Intel TensorFlow 2.14	ResNet50 v1.5	Image Recognition	bf32	2,013.88 img/s	1.84		1
Intel TensorFlow 2.14	ResNet50 v1.5	Image Recognition	bf32	2,874.29 img/s	2.73		32
OpenVINO 2023.2	ResNet50 v1.5	Image Recognition	Int8	18,674.37 img/s	26.68		1
OpenVINO 2023.2	ResNet50 v1.5	Image Recognition	bf16	11,537.06 img/s	16.48		1
OpenVINO 2023.2	ResNet50 v1.5	Image Recognition	fp32	1,721.58 img/s	2.46		1
MLPerf Inference v4.0	BERT-Large (offline, 99.0% acc)	Natural Language Processing	int8	1,668.5 samp/s			1,300
Intel PyTorch 2.1	BERTLarge	Natural Language Processing	int8	411.14 sent/s	0.42		1
Intel PyTorch 2.1	BERTLarge	Natural Language Processing	int8	455.33 sent/s	0.45		16
Intel PyTorch 2.1	BERTLarge	Natural Language Processing	bf16	243.89 sent/s	0.24		1
Intel PyTorch 2.1	BERTLarge	Natural Language Processing	bf16	278.00 sent/s	0.25		44
Intel PyTorch 2.1	BERTLarge	Natural Language Processing	fp32	44.56 sent/s	0.04		1
Intel PyTorch 2.1	BERTLarge	Natural Language Processing	fp32	50.49 sent/s	0.05		16
Intel PyTorch 2.1	BERTLarge	Natural Language Processing	bf32	98.49 sent/s	0.09		1
Intel PyTorch 2.1	BERTLarge	Natural Language Processing	bf32	96.98 sent/s	0.09		16
Intel TensorFlow 2.14	BERTLarge	Natural Language Processing	int8	323.58 sent/s	0.32		1
Intel TensorFlow 2.14	BERTLarge	Natural Language Processing	int8	324.56 sent/s	0.33		12
Intel TensorFlow 2.14	BERTLarge	Natural Language Processing	bf16	224.04 sent/s	0.22		1
Intel TensorFlow 2.14	BERTLarge	Natural Language Processing	bf16	231.37 sent/s	0.23		28
Intel TensorFlow 2.14	BERTLarge	Natural Language Processing	fp32	55.34 sent/s	0.05		1
Intel TensorFlow 2.14	BERTLarge	Natural Language Processing	fp32	48.46 sent/s	0.05		12
Intel TensorFlow 2.14	BERTLarge	Natural Language Processing	bf32	101.93 sent/s	0.10		1
Intel TensorFlow 2.14	BERTLarge	Natural Language Processing	bf32	98.81 sent/s	0.10		12
OpenVINO 2023.2	BERTLarge	Natural Language Processing	int8	373.6867 sent/s	0.37		1
OpenVINO 2023.2	BERTLarge	Natural Language Processing	int8	388.25 sent/s	0.39		32
OpenVINO 2023.2	BERTLarge	Natural Language Processing	bf16	244.25 sent/s	0.24		1
OpenVINO 2023.2	BERTLarge	Natural Language Processing	bf16	281.79 sent/s	0.27		40
OpenVINO 2023.2	BERTLarge	Natural Language Processing	fp32	57.16667 sent/s	0.06		1
OpenVINO 2023.2	BERTLarge	Natural Language Processing	fp32	55.67 sent/s	0.05		16
Intel PyTorch 2.1	DLRM Criteo Terabyte	Recommender	int8	23,444,587 rec/s	23611.92		128
Intel PyTorch 2.1	DLRM Criteo Terabyte	Recommender	bf16	13,223,343 rec/s	12742.32		128
Intel PyTorch 2.1	DLRM Criteo Terabyte	Recommender	fp32	2,742,037 rec/s	2615.42		128
Intel PyTorch 2.1	DLRM Criteo Terabyte	Recommender	bf32	6,760,005 rec/s	6699.18		128
MLPerf Inference v4.0	DLRM-v2 (offline, 99.9% acc)	Recommender	int8	9,111.08 samp/s			400
Intel PyTorch 2.1	DistilBERT	Natural Language Processing	int8	6,380.26 sent/s	6.80		1
Intel PyTorch 2.1	DistilBERT	Natural Language Processing	int8	10,701.44 sent/s	11.39		104
Intel PyTorch 2.1	DistilBERT	Natural Language Processing	bf16	4,651.69 sent/s	4.97		1
Intel PyTorch 2.1	DistilBERT	Natural Language Processing	bf16	6,864.75 sent/s	7.23		88
Intel PyTorch 2.1	DistilBERT	Natural Language Processing	fp32	1,121.45 sent/s	1.12		1
Intel PyTorch 2.1	DistilBERT	Natural Language Processing	fp32	1,205.86 sent/s	1.27		32
Intel PyTorch 2.1	DistilBERT	Natural Language Processing	bf32	2,161.93 sent/s	2.15		1
Intel PyTorch 2.1	DistilBERT	Natural Language Processing	bf32	2,584.98 sent/s	2.63		56
Intel TensorFlow 2.14	Transformer MLPerf	Language Translation	int8	77.94 sent/s	0.07		1
Intel TensorFlow 2.14	Transformer MLPerf	Language Translation	int8	334.65 sent/s	0.31		448
Intel TensorFlow 2.14	Transformer MLPerf	Language Translation	bf16	52 sent/s	0.05		1
Intel TensorFlow 2.14	Transformer MLPerf	Language Translation	bf16	367.07 sent/s	0.35		448
Intel TensorFlow 2.14	Transformer MLPerf	Language Translation	fp32	1,099.6 sent/s	26.53		1
Intel TensorFlow 2.14	Transformer MLPerf	Language Translation	fp32	137.37 sent/s	0.12		448
Intel TensorFlow 2.14	Transformer MLPerf	Language Translation	bf32	24.86 sent/s	0.02		1
Intel TensorFlow 2.14	Transformer MLPerf	Language Translation	bf32	155.04 sent/s	0.14		448
OpenVINO 2023.2	3D-Unet	Image Segmentation	int8	30.31 samples/s	0.03		1
OpenVINO 2023.2	3D-Unet	Image Segmentation	int8	27.18333 samples/s	0.02		6
OpenVINO 2023.2	3D-Unet	Image Segmentation	bf16	15.67667 samples/s	0.01		1
OpenVINO 2023.2	3D-Unet	Image Segmentation	bf16	3.18 samples/s	0.00		7
OpenVINO 2023.2	3D-Unet	Image Segmentation	fp32	3.49 samples/s	0.00		1
OpenVINO 2023.2	3D-Unet	Image Segmentation	fp32	14.40 samples/s	0.01		3
OpenVINO 2023.2	SSD-ResNet34 COCO 2017 (1200 x1200)	Object Detection	int8	590.2267 img/s	0.57		1
OpenVINO 2023.2	SSD-ResNet34 COCO 2017 (1200 x1200)	Object Detection	bf16	297.79 img/s	0.28		1
OpenVINO 2023.2	SSD-ResNet34 COCO 2017 (1200 x1200)	Object Detection	fp32	36.92 img/s	0.04		1
Intel PyTorch 2.1	ResNeXt101 32x16d ImageNet	Image Classification	int8	1,679.87 fps	1.73		1
Intel PyTorch 2.1	ResNeXt101 32x16d ImageNet	Image Classification	int8	2,481.66 fps	2.56		58
Intel PyTorch 2.1	ResNeXt101 32x16d ImageNet	Image Classification	bf16	802.44 fps	0.80		1
Intel PyTorch 2.1	ResNeXt101 32x16d ImageNet	Image Classification	bf16	1,175.18 fps	1.10		72
Intel PyTorch 2.1	ResNeXt101 32x16d ImageNet	Image Classification	fp32	186.33 fps	0.19		1
Intel PyTorch 2.1	ResNeXt101 32x16d ImageNet	Image Classification	fp32	202.33 fps	0.19		40
Intel PyTorch 2.1	ResNeXt101 32x16d ImageNet	Image Classification	bf32	279.07 fps	0.28		1
Intel PyTorch 2.1	ResNeXt101 32x16d ImageNet	Image Classification	bf32	320.62 fps	0.29		58
OpenVINO 2023.2	Yolo-v8n	Object Detection	Int8	3,513.54 img/s			1
OpenVINO 2023.2	Yolo-v8n	Object Detection	bf16	3,632.55 img/s			1
OpenVINO 2023.2	Yolov-8n	Object Detection	fp32	1,249.91 img/s			1
MLPerf Inference v4.0	RetinaNet (offline)	Object Detection	int8	371.08 samp/s			2
MLPerf Inference v4.0	RNN-T (offline)	Speech-to-text	int8+bf16	8,679.48 samp/s			256

Training

Transfer Learning / Fine Tuning

Framework Version	Model	Usage	Precision	TTT (minutes)	Accuray	Batch Size	Ranks
Transformers 4.31, Intel Extension for Pytorch 2.0.1, PEFT 0.4.0	GPT-J 6B (Glue MNLI dataset)	Fine-tuning, Text-generation	bf16	184.20	82.2	8	1
Transformers 4.34.1, Intel PyTorch 2.1.0, PEFT 0.5.0, Intel(r) oneCCL v2.1.0	BioGPT 1.5B (PubMedQA dataset)	Response generation	bf16	39.80	79.4	8	8
Intel(r) Tensorflow 2.14, horovod 0.28, Open MPI 4.1.2, Python 3.10.0	ResNet50 v1.50 (Colorectal histology dataset)	Colorectal cancer detection	fp32	6.98	94.1	32	64
Intel(r) Tensorflow 2.14, horovod 0.28, Open MPI 4.1.2, Python 3.10.0	ResNet50 v1.50 (Colorectal histology dataset)	Colorectal cancer detection	bf16	4.08	94.9	32	64
Intel(r) Tensorflow 2.14, horovod 0.28, Open MPI 4.1.2, Python 3.10.0	ResNet50 v1.50 (Colorectal histology dataset)	Colorectal cancer detection	fp32	5.34	94.1	32	128
Intel(r) Tensorflow 2.14, horovod 0.28, Open MPI 4.1.2, Python 3.10.0	ResNet50 v1.50 (Colorectal histology dataset)	Colorectal cancer detection	bf16	2.90	94.9	32	128
Transformers 4.35.0, Intel PyTorch 2.0.100, Intel® oneCCL 2.0.100	BERTLarge Uncased (IMDb dataset)	Sentiment Analysis	fp32	47.95	93.84	64	4
Transformers 4.35.0, Intel PyTorch 2.0.100, Intel® oneCCL 2.0.100	BERTLarge Uncased (IMDb dataset)	Sentiment Analysis	bf16	15.96	93.8	64	4
Transformers 4.35.0, Intel PyTorch 2.0.100, Intel® oneCCL 2.0.100	BERTLarge Uncased (GLUE SST2 dataset)	Sentiment Analysis	fp32	10.48	92.2	256	4
Transformers 4.35.0, Intel PyTorch 2.0.100, Intel® oneCCL 2.0.100	BERTLarge Uncased (GLUE SST2 dataset)	Sentiment Analysis	bf16	2.93	92.09	256	4

Training Throughput

Framework Version	Model/Dataset	Usage	Precision	Throughput	Perf/Watt	Batch size
Intel PyTorch 2.1	ResNet50 v1.5	Image Recognition	fp32	175.29 img/s	0.22	128
Intel PyTorch 2.1	ResNet50 v1.5	Image Recognition	bf16	396.24 img/s	0.52	256
Intel PyTorch 2.1	ResNet50 v1.5	Image Recognition	bf32	197.14 img/s	0.25	128
Intel TensorFlow 2.14	ResNet50 v1.5 ImageNet (224 x224)	Image Recognition	fp32	145.93 img/s	0.19	512
Intel TensorFlow 2.14	ResNet50 v1.5 ImageNet (224 x224)	Image Recognition	bf16	354.45 img/s	0.46	512
Intel TensorFlow 2.14	ResNet50 v1.5 ImageNet (224 x224)	Image Recognition	bf32	166.37 img/s	0.21	512
Intel PyTorch 2.1	DLRM Criteo Terabyte, QUAD Mode	Recommender	fp32	290,772.24 rec/s	359.83	32,768
Intel PyTorch 2.1	DLRM Criteo Terabyte, QUAD Mode	Recommender	bf16	862,286.46 rec/s	1,055.35	32,768
Intel PyTorch 2.1	DLRM Criteo Terabyte, QUAD Mode	Recommender	bf32	417,584.33 rec/s	504.29	32,768
Intel TensorFlow 2.14	SSD-ResNet34 COCO 2017 (1200 x1200)	Object Detection	fp32	61.25 img/s	0.09	448
Intel TensorFlow 2.14	SSD-ResNet34 COCO 2017 (1200 x1200)	Object Detection	bf16	219.77 img/s	0.31	448
Intel TensorFlow 2.14	SSD-ResNet34 COCO 2017 (1200 x1200)	Object Detection	bf32	83.44 img/s	0.11	448
Intel PyTorch 2.1	RNNT LibriSpeech	Speech Recognition	fp32	4.35 fps	0.01	64
Intel PyTorch 2.1	RNNT LibriSpeech	Speech Recognition	bf16	35.13 fps	0.04	64
Intel PyTorch 2.1	RNNT LibriSpeech	Speech Recognition	bf32	13.65 fps	0.02	32
Intel PyTorch 2.1	MaskR-CNN COCO 2017	Object Detection	fp32	4.8 img/s	0.01	128
Intel PyTorch 2.1	MaskR-CNN COCO 2017	Object Detection	bf16	16.43 img/s	0.02	128
Intel PyTorch 2.1	MaskR-CNN COCO 2017	Object Detection	bf32	5.37 img/s	0.01	96
Intel PyTorch 2.1	BERTLarge Wikipedia 2020/01/01 seq len=512	Natural Language Processing	fp32	4.41 sent/s	0.01	64
Intel PyTorch 2.1	BERTLarge Wikipedia 2020/01/01 seq len=512	Natural Language Processing	bf16	12.53 sent/s	0.02	28
Intel PyTorch 2.1	BERTLarge Wikipedia 2020/01/01 seq len=512	Natural Language Processing	bf32	5.52 sent/s	0.01	56
Intel TensorFlow 2.14	BERTLarge Wikipedia 2020/01/01 seq len=512	Natural Language Processing	fp32	5.38 sent/s	0.01	64
Intel TensorFlow 2.14	BERTLarge Wikipedia 2020/01/01 seq len=512	Natural Language Processing	bf16	11.74 sent/s	0.02	64
Intel TensorFlow 2.14	BERTLarge Wikipedia 2020/01/01 seq len=512	Natural Language Processing	bf32	6.07 sent/s	0.01	64
Intel TensorFlow 2.14	Transformer MLPerf	Language Translation	fp32	15,671.55 sent/s	16.95	42,000
Intel TensorFlow 2.14	Transformer MLPerf	Language Translation	bf16	40,653.1 sent/s	43.77	42,000
Intel TensorFlow 2.14	Transformer MLPerf	Language Translation	bf32	15,316.08 sent/s	15.44	42,000

Hardware and software configuration (measured October 24, 2023):

Deep Learning configuration:

Hardware configuration for Intel® Xeon® Platinum 8592+ processor (code named Emerald Rapids): 2 sockets for inference, 1 socket for training, 64 cores, 350 watts, 1024GB 16 x 64GB DDR5 5600 MT/s memory, operating system CentOS* Stream 9. Using Intel® Advanced Matrix Extensions (Intel® AMX) int8 and bf16 with Intel® oneAPI Deep Neural Network Library (oneDNN) optimized kernels integrated into Intel® Extension for PyTorch*, Intel® Extension for TensorFlow*, and Intel® Distribution of OpenVINO™ toolkit. Measurements may vary. If the dataset is not listed, a synthetic dataset was used to measure performance.

Transfer Learning configuration:

Hardware configuration for Intel® Xeon® Platinum 8592+ processor (code named Emerald Rapids): 2 sockets, 64 cores, 350 watts, 16 x 64 GB DDR5 5600 memory, BIOS version 3B05.TEL4P1, operating system: CentOS stream 8, using Intel® Advanced Matrix Extensions (Intel® AMX) int8 and bf16 with Intel® oneAPI Deep Neural Network Library (oneDNN) v2.6.0 optimized kernels integrated into Intel® Extension for PyTorch* v2.0.1, Intel® Extension for TensorFlow* v2.14, and Intel® oneAPI Data Analytics Library (oneDAL) 2023.1 optimized kernels integrated into Intel® Extension for Scikit-learn* v2023.1. Intel® Distribution of Modin* v2.1.1, and Intel oneAPI Math Kernel Library (oneMKL) v2023.1. Measurements may vary.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Performance Data for Intel® AI Data Center Products

Find the latest AI benchmark performance data for Intel Data Center products, including detailed hardware and software configurations.

Pretrained models, sample scripts, best practices, and tutorials

Measurements were taken using:

Intel® Xeon® Platinum 8592+ Processor (64 Cores)

Inference

Training

Transfer Learning / Fine Tuning

Training Throughput