水木刀Xinyu Chen
Published

An AI-driven Search-by-Image Engine on KV260

The DPU and customized accelerator on KV260 made the AI-driven search-by-image engine much more efficient than ever!

IntermediateFull instructions provided532

Things used in this project

Hardware components

Kria KV260 Vision AI Starter Kit
AMD Kria KV260 Vision AI Starter Kit
×1

Software apps and online services

Vitis Unified Software Platform
AMD Vitis Unified Software Platform
Vivado Design Suite
AMD Vivado Design Suite
Caffe framework
crow

Story

Read more

Code

Demo File

C/C++
You can download it and deploy our project on the KV260 board quickly.
No preview (download only).

dpu_conf.vh

C/C++
The architecture configuration of DPU
//Setting the arch of DPU, For more details, Please read the PG338 


/*====== Architecture Options ======*/
// |------------------------------------------------------|
// | Support 8 DPU size
// | It relates to model. if change, must update model
// +------------------------------------------------------+
// | `define B512               
// +------------------------------------------------------+
// | `define B800                 
// +------------------------------------------------------+
// | `define B1024                 
// +------------------------------------------------------+
// | `define B1152                 
// +------------------------------------------------------+
// | `define B1600                 
// +------------------------------------------------------+
// | `define B2304                 
// +------------------------------------------------------+
// | `define B3136                 
// +------------------------------------------------------+
// | `define B4096                 
// |------------------------------------------------------|

`define B1152 

// |------------------------------------------------------|
// | If the FPGA has Uram. You can define URAM_EN parameter               
// | if change, Don't need update model
// +------------------------------------------------------+
// | for zcu104 : `define URAM_ENABLE               
// +------------------------------------------------------+
// | for zcu102 : `define URAM_DISABLE                 
// |------------------------------------------------------|

`define URAM_ENABLE 

//config URAM
`ifdef URAM_ENABLE
    `define def_UBANK_IMG_N          5
    `define def_UBANK_WGT_N          17
    `define def_UBANK_BIAS           1
`elsif URAM_DISABLE
    `define def_UBANK_IMG_N          0
    `define def_UBANK_WGT_N          0
    `define def_UBANK_BIAS           0
`endif

// |------------------------------------------------------|
// | You can use DRAM if FPGA has extra LUTs               
// | if change, Don't need update model
// +------------------------------------------------------+
// | Enable DRAM  : `define DRAM_ENABLE               
// +------------------------------------------------------+
// | Disable DRAM : `define DRAM_DISABLE                 
// |------------------------------------------------------|

`define DRAM_DISABLE 

//config DRAM
`ifdef DRAM_ENABLE
    `define def_DBANK_IMG_N          1 
    `define def_DBANK_WGT_N          1
    `define def_DBANK_BIAS           1
`elsif DRAM_DISABLE
    `define def_DBANK_IMG_N          0
    `define def_DBANK_WGT_N          0
    `define def_DBANK_BIAS           0
`endif

// |------------------------------------------------------|
// | RAM Usage Configuration              
// | It relates to model. if change, must update model
// +------------------------------------------------------+
// | RAM Usage High : `define RAM_USAGE_HIGH               
// +------------------------------------------------------+
// | RAM Usage Low  : `define RAM_USAGE_LOW                 
// |------------------------------------------------------|

`define RAM_USAGE_LOW

// |------------------------------------------------------|
// | Channel Augmentation Configuration
// | It relates to model. if change, must update model
// +------------------------------------------------------+
// | Enable  : `define CHANNEL_AUGMENTATION_ENABLE              
// +------------------------------------------------------+
// | Disable : `define CHANNEL_AUGMENTATION_DISABLE                
// |------------------------------------------------------|

`define CHANNEL_AUGMENTATION_ENABLE

// |------------------------------------------------------|
// | DepthWiseConv Configuration
// | It relates to model. if change, must update model
// +------------------------------------------------------+
// | Enable  : `define DWCV_ENABLE              
// +------------------------------------------------------+
// | Disable : `define DWCV_DISABLE               
// |------------------------------------------------------|

`define DWCV_ENABLE

// |------------------------------------------------------|
// | Pool Average Configuration
// | It relates to model. if change, must update model
// +------------------------------------------------------+
// | Enable  : `define POOL_AVG_ENABLE              
// +------------------------------------------------------+
// | Disable : `define POOL_AVG_DISABLE                
// |------------------------------------------------------|

`define POOL_AVG_ENABLE

// |------------------------------------------------------|
// | support multiplication of two feature maps
// | It relates to model. if change, must update model
// +------------------------------------------------------+
// | Enable  : `define ELEW_MULT_ENABLE           
// +------------------------------------------------------+
// | Disable : `define ELEW_MULT_DISABLE               
// |------------------------------------------------------|

`define ELEW_MULT_DISABLE

// +------------------------------------------------------+
// | RELU Type Configuration
// | It relates to model. if change, must update model
// +------------------------------------------------------+
// | `define RELU_RELU6
// +------------------------------------------------------+
// | `define RELU_LEAKYRELU_RELU6
// |------------------------------------------------------|

`define RELU_LEAKYRELU_RELU6

// |------------------------------------------------------|
// | DSP48 Usage Configuration  
// | Use dsp replace of lut in conv operate 
// | if change, Don't need update model
// +------------------------------------------------------+
// | `define DSP48_USAGE_HIGH              
// +------------------------------------------------------+
// | `define DSP48_USAGE_LOW                
// |------------------------------------------------------|

`define DSP48_USAGE_HIGH 

// |------------------------------------------------------|
// | Power Configuration
// | if change, Don't need update model
// +------------------------------------------------------+
// | `define LOWPOWER_ENABLE              
// +------------------------------------------------------+
// | `define LOWPOWER_DISABLE               
// |------------------------------------------------------|

`define LOWPOWER_DISABLE

// |------------------------------------------------------|
// | DEVICE Configuration
// | if change, Don't need update model
// +------------------------------------------------------+
// | `define MPSOC              
// +------------------------------------------------------+
// | `define ZYNQ7000               
// |------------------------------------------------------|

`define MPSOC
  



 

dpu_googlenet_hash48.prototxt

Protobuf
The architecture of hash-minigooglenet
#########################################################################
#input layer for deployment

#name: "miniGoogleNet for CIFAR10, model 1"
#layer {
#  name: "data"
#  type: "Input"
#  top: "data"
#  input_param { shape: { dim: 1 dim: 3 dim: 32 dim: 32 } }
#}

#########################################################################
# layers for TRAIN and TEST

name: "miniGoogleNet for CIFAR10, model 3"
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 32
    #mean_file: "../../data/ilsvrc12/imagenet_mean.binaryproto"
        mean_value: 104
        mean_value: 117
        mean_value: 123
  }
  data_param {
    source: "../../data/cifar10/cifar10_train_leveldb"
    batch_size: 32
  }
}
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    mirror: false
    crop_size: 32
    #mean_file: "../../data/ilsvrc12/imagenet_mean.binaryproto"
        mean_value: 104
        mean_value: 117
        mean_value: 123

  }
  data_param {
    source: "../../data/cifar10/cifar10_val_leveldb"
    batch_size: 32
  }
}


############################################################
#L1 MiniGoogLeNet.conv_module(inputs, 96, 3, 3, (1, 1), chanDim)
############################################################
layer {
  name: "conv1/3x3_s1"
  type: "Convolution"
  bottom: "data"
  top: "conv1/3x3_s1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
      std: 0.1
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
layer {
  name: "conv1/bn1"
  type: "BatchNorm"
  bottom: "conv1/3x3_s1"
  top: "conv1/3x3_s1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  batch_norm_param {
    use_global_stats: false
  }
}
layer {
  name: "conv1/scale1"
  type: "Scale"
  bottom: "conv1/3x3_s1"
  top: "conv1/3x3_s1"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "conv1/relu1"
  type: "ReLU"
  bottom: "conv1/3x3_s1"
  top: "conv1/3x3_s1"
}
####################################################################
# L2 MiniGoogLeNet.inception_module(x, 32, 32, chanDim)
####################################################################
layer {
  name: "inception_2a/1x1"
  type: "Convolution"
  bottom: "conv1/3x3_s1"
  top: "inception_2a/1x1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 32
    kernel_size: 1
    weight_filler {
      type: "xavier"
      std: 0.03
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
layer {
  name: "inception_2a/1x1/bn1"
  type: "BatchNorm"
  bottom: "inception_2a/1x1"
  top: "inception_2a/1x1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  batch_norm_param {
    use_global_stats: false
  }
}
layer {
  name: "inception_2a/1x1/scale1"
  type: "Scale"
  bottom: "inception_2a/1x1"
  top: "inception_2a/1x1"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "inception_2a/1x1/relu1"
  type: "ReLU"
  bottom: "inception_2a/1x1"
  top: "inception_2a/1x1"
}

layer {
  name: "inception_2a/3x3"
  type: "Convolution"
  bottom: "conv1/3x3_s1"
  top: "inception_2a/3x3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 32
    kernel_size: 3
    pad: 1
    weight_filler {
      type: "xavier"
      std: 0.03
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
layer {
  name: "inception_2a/3x3/bn1"
  type: "BatchNorm"
  bottom: "inception_2a/3x3"
  top: "inception_2a/3x3"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  batch_norm_param {
    use_global_stats: false
  }
}
layer {
  name: "inception_2a/3x3/scale1"
  type: "Scale"
  bottom: "inception_2a/3x3"
  top: "inception_2a/3x3"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "inception_2a/3x3/relu1"
  type: "ReLU"
  bottom: "inception_2a/3x3"
  top: "inception_2a/3x3"
}

layer {
  name: "inception_2a/output"
  type: "Concat"
  bottom: "inception_2a/1x1"
  bottom: "inception_2a/3x3"
  top: "inception_2a/output"
}
########################################################################
# L3 MiniGoogLeNet.inception_module(x, 32, 48, chanDim)
########################################################################
layer {
  name: "inception_3a/1x1"
  type: "Convolution"
  bottom: "inception_2a/output"
  top: "inception_3a/1x1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 32
    kernel_size: 1
    weight_filler {
      type: "xavier"
      std: 0.03
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
layer {
  name: "inception_3a/1x1/bn1"
  type: "BatchNorm"
  bottom: "inception_3a/1x1"
  top: "inception_3a/1x1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  batch_norm_param {
    use_global_stats: false
  }
}
layer {
  name: "inception_3a/1x1/scale1"
  type: "Scale"
  bottom: "inception_3a/1x1"
  top: "inception_3a/1x1"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "inception_3a/1x1/relu1"
  type: "ReLU"
  bottom: "inception_3a/1x1"
  top: "inception_3a/1x1"
}

layer {
  name: "inception_3a/3x3"
  type: "Convolution"
  bottom: "inception_2a/output"
  top: "inception_3a/3x3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 48
    kernel_size: 3
    pad: 1
    weight_filler {
      type: "xavier"
      std: 0.03
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
layer {
  name: "inception_3a/3x3/bn1"
  type: "BatchNorm"
  bottom: "inception_3a/3x3"
  top: "inception_3a/3x3"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  batch_norm_param {
    use_global_stats: false
  }
}
layer {
  name: "inception_3a/3x3/scale1"
  type: "Scale"
  bottom: "inception_3a/3x3"
  top: "inception_3a/3x3"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "inception_3a/3x3/relu1"
  type: "ReLU"
  bottom: "inception_3a/3x3"
  top: "inception_3a/3x3"
}

layer {
  name: "inception_3a/output"
  type: "Concat"
  bottom: "inception_3a/1x1"
  bottom: "inception_3a/3x3"
  top: "inception_3a/output"
}

########################################################################
# downample module
#L4 MiniGoogLeNet.downsample_module(x, 80, chanDim)
########################################################################

layer {
  name: "downsample_4/3x3_s2"
  type: "Convolution"
  bottom: "inception_3a/output"
  top: "downsample_4/3x3_s2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 80
    kernel_size: 3
    stride:2
    pad: 1
    weight_filler {
      type: "xavier"
      std: 0.03
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
layer {
  name: "downsample_4/3x3_s2/bn1"
  type: "BatchNorm"
  bottom: "downsample_4/3x3_s2"
  top: "downsample_4/3x3_s2"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  batch_norm_param {
    use_global_stats: false
  }
}
layer {
  name: "downsample_4/3x3_s2/scale1"
  type: "Scale"
  bottom: "downsample_4/3x3_s2"
  top: "downsample_4/3x3_s2"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "downsample_4/3x3_s2/relu1"
  type: "ReLU"
  bottom: "downsample_4/3x3_s2"
  top: "downsample_4/3x3_s2"
}
layer {
  name: "downsample_4/pool_s2"
  type: "Pooling"
  bottom: "inception_3a/output"
  top: "downsample_4/pool_s2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
    #pad: 1
  }
}
layer {
  name: "downsample_4/output"
  type: "Concat"
  bottom: "downsample_4/3x3_s2"
  bottom: "downsample_4/pool_s2"
  top: "downsample_4/output"
}

########################################################################
# L5 MiniGoogLeNet.inception_module(x, 112, 48, chanDim)
########################################################################
layer {
  name: "inception_5a/1x1"
  type: "Convolution"
  bottom: "downsample_4/output"
  top: "inception_5a/1x1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 112
    kernel_size: 1
    weight_filler {
      type: "xavier"
      std: 0.03
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
layer {
  name: "inception_5a/1x1/bn1"
  type: "BatchNorm"
  bottom: "inception_5a/1x1"
  top: "inception_5a/1x1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  batch_norm_param {
    use_global_stats: false
  }
}
layer {
  name: "inception_5a/1x1/scale1"
  type: "Scale"
  bottom: "inception_5a/1x1"
  top: "inception_5a/1x1"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "inception_5a/1x1/relu1"
  type: "ReLU"
  bottom: "inception_5a/1x1"
  top: "inception_5a/1x1"
}

layer {
  name: "inception_5a/3x3"
  type: "Convolution"
  bottom: "downsample_4/output"
  top: "inception_5a/3x3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 48
    kernel_size: 3
    pad: 1
    weight_filler {
      type: "xavier"
      std: 0.03
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
layer {
  name: "inception_5a/3x3/bn1"
  type: "BatchNorm"
  bottom: "inception_5a/3x3"
  top: "inception_5a/3x3"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  batch_norm_param {
    use_global_stats: false
  }
}
layer {
  name: "inception_5a/3x3/scale1"
  type: "Scale"
  bottom: "inception_5a/3x3"
  top: "inception_5a/3x3"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "inception_5a/3x3/relu1"
  type: "ReLU"
  bottom: "inception_5a/3x3"
  top: "inception_5a/3x3"
}

layer {
  name: "inception_5a/output"
  type: "Concat"
  bottom: "inception_5a/1x1"
  bottom: "inception_5a/3x3"
  top: "inception_5a/output"
}


########################################################################
# L6 MiniGoogLeNet.inception_module(x, 96, 64, chanDim)
########################################################################
layer {
  name: "inception_6a/1x1"
  type: "Convolution"
  bottom: "inception_5a/output"
  top: "inception_6a/1x1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 1
    weight_filler {
      type: "xavier"
      std: 0.03
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
layer {
  name: "inception_6a/1x1/bn1"
  type: "BatchNorm"
  bottom: "inception_6a/1x1"
  top: "inception_6a/1x1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  batch_norm_param {
    use_global_stats: false
  }
}
layer {
  name: "inception_6a/1x1/scale1"
  type: "Scale"
  bottom: "inception_6a/1x1"
  top: "inception_6a/1x1"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "inception_6a/1x1/relu1"
  type: "ReLU"
  bottom: "inception_6a/1x1"
  top: "inception_6a/1x1"
}

layer {
  name: "inception_6a/3x3"
  type: "Convolution"
  bottom: "inception_5a/output"
  top: "inception_6a/3x3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    kernel_size: 3
    pad: 1
    weight_filler {
      type: "xavier"
      std: 0.03
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
layer {
  name: "inception_6a/3x3/bn1"
  type: "BatchNorm"
  bottom: "inception_6a/3x3"
  top: "inception_6a/3x3"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  batch_norm_param {
    use_global_stats: false
  }
}
layer {
  name: "inception_6a/3x3/scale1"
  type: "Scale"
  bottom: "inception_6a/3x3"
  top: "inception_6a/3x3"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "inception_6a/3x3/relu1"
  type: "ReLU"
  bottom: "inception_6a/3x3"
  top: "inception_6a/3x3"
}

layer {
  name: "inception_6a/output"
  type: "Concat"
  bottom: "inception_6a/1x1"
  bottom: "inception_6a/3x3"
  top: "inception_6a/output"
}

########################################################################
# L7 MiniGoogLeNet.inception_module(x, 80, 80, chanDim)
########################################################################
layer {
  name: "inception_7a/1x1"
  type: "Convolution"
  bottom: "inception_6a/output"
  top: "inception_7a/1x1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 80
    kernel_size: 1
    weight_filler {
      type: "xavier"
      std: 0.03
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
layer {
  name: "inception_7a/1x1/bn1"
  type: "BatchNorm"
  bottom: "inception_7a/1x1"
  top: "inception_7a/1x1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  batch_norm_param {
    use_global_stats: false
  }
}
layer {
  name: "inception_7a/1x1/scale1"
  type: "Scale"
  bottom: "inception_7a/1x1"
  top: "inception_7a/1x1"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "inception_7a/1x1/relu1"
  type: "ReLU"
  bottom: "inception_7a/1x1"
  top: "inception_7a/1x1"
}

layer {
  name: "inception_7a/3x3"
  type: "Convolution"
  bottom: "inception_6a/output"
  top: "inception_7a/3x3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 80
    kernel_size: 3
    pad: 1
    weight_filler {
      type: "xavier"
      std: 0.03
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
layer {
  name: "inception_7a/3x3/bn1"
  type: "BatchNorm"
  bottom: "inception_7a/3x3"
  top: "inception_7a/3x3"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  batch_norm_param {
    use_global_stats: false
  }
}
layer {
  name: "inception_7a/3x3/scale1"
  type: "Scale"
  bottom: "inception_7a/3x3"
  top: "inception_7a/3x3"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "inception_7a/3x3/relu1"
  type: "ReLU"
  bottom: "inception_7a/3x3"
  top: "inception_7a/3x3"
}

layer {
  name: "inception_7a/output"
  type: "Concat"
  bottom: "inception_7a/1x1"
  bottom: "inception_7a/3x3"
  top: "inception_7a/output"
}

########################################################################
# L8 MiniGoogLeNet.inception_module(x, 48, 96, chanDim)
########################################################################
layer {
  name: "inception_8a/1x1"
  type: "Convolution"
  bottom: "inception_7a/output"
  top: "inception_8a/1x1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 48
    kernel_size: 1
    weight_filler {
      type: "xavier"
      std: 0.03
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
layer {
  name: "inception_8a/1x1/bn1"
  type: "BatchNorm"
  bottom: "inception_8a/1x1"
  top: "inception_8a/1x1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  batch_norm_param {
    use_global_stats: false
  }
}
layer {
  name: "inception_8a/1x1/scale1"
  type: "Scale"
  bottom: "inception_8a/1x1"
  top: "inception_8a/1x1"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "inception_8a/1x1/relu1"
  type: "ReLU"
  bottom: "inception_8a/1x1"
  top: "inception_8a/1x1"
}

layer {
  name: "inception_8a/3x3"
  type: "Convolution"
  bottom: "inception_7a/output"
  top: "inception_8a/3x3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 3
    pad: 1
    weight_filler {
      type: "xavier"
      std: 0.03
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
layer {
  name: "inception_8a/3x3/bn1"
  type: "BatchNorm"
  bottom: "inception_8a/3x3"
  top: "inception_8a/3x3"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  batch_norm_param {
    use_global_stats: false
  }
}
layer {
  name: "inception_8a/3x3/scale1"
  type: "Scale"
  bottom: "inception_8a/3x3"
  top: "inception_8a/3x3"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "inception_8a/3x3/relu1"
  type: "ReLU"
  bottom: "inception_8a/3x3"
  top: "inception_8a/3x3"
}

layer {
  name: "inception_8a/output"
  type: "Concat"
  bottom: "inception_8a/1x1"
  bottom: "inception_8a/3x3"
  top: "inception_8a/output"
...

This file has been truncated, please download it to see its full contents.

googlenet.xmodel

C/C++
The binary file xmodel of DPU
No preview (download only).

dpu.cpp

C/C++
invoke the DPU on KV260 board
#include "dpu.h"

float mean[3] = {104, 117, 123};

static std::vector<float> convert_fixpoint_to_float(vart::TensorBuffer* tensor,
                                                    float scale);
uint64_t runDPU(std::unique_ptr<vart::Runner> &runner, std::string image_file_name);

static cv::Mat read_image(const std::string& image_file_name) {
  // read image from a file
  auto input_image = cv::imread(image_file_name);
  CHECK(!input_image.empty()) << "cannot load " << image_file_name;
  return input_image;
}

static std::unique_ptr<vart::TensorBuffer> create_cpu_flat_tensor_buffer(
    const xir::Tensor* tensor) {
  return std::make_unique<vart::mm::HostFlatTensorBuffer>(tensor);
}

inline std::vector<const xir::Subgraph*> get_dpu_subgraph(
    const xir::Graph* graph) {
  auto root = graph->get_root_subgraph();
  auto children = root->children_topological_sort();
  auto ret = std::vector<const xir::Subgraph*>();
  for (auto c : children) {
    CHECK(c->has_attr("device"));
    auto device = c->get_attr<std::string>("device");
    if (device == "DPU") {
      ret.emplace_back(c);
    }
  }
  return ret;
}

std::unique_ptr<vart::Runner> InitDPU(string model_name) {
    const auto kernel_name = std::string("subgraph_avg_pool_12/8x8_s1");
    auto graph = xir::Graph::deserialize(model_name);
    auto subgraph = get_dpu_subgraph(graph.get());
    auto runner = vart::dpu::DpuRunnerFactory::create_dpu_runner(model_name, kernel_name);
    return runner;
}

uint64_t runDPU(std::unique_ptr<vart::Runner> &runner, std::string image_file_name) {
    auto input_tensors = runner->get_input_tensors();
    auto output_tensors = runner->get_output_tensors();

    // create runner and input/output tensor buffers;
    auto input_scale = vart::get_input_scale(input_tensors);
    auto output_scale = vart::get_output_scale(output_tensors);

    // prepare input tensor buffer
    CHECK_EQ(input_tensors.size(), 1u) << "only support googlenet model";
    auto input_tensor = input_tensors[0];
    auto height = input_tensor->get_shape().at(1);
    auto width = input_tensor->get_shape().at(2);
    auto input_tensor_buffer = create_cpu_flat_tensor_buffer(input_tensor);

    // prepare output tensor buffer
    CHECK_EQ(output_tensors.size(), 1u) << "only support googlenet model";
    auto output_tensor = output_tensors[0];
    auto output_tensor_buffer = create_cpu_flat_tensor_buffer(output_tensor);

    uint64_t data_in = 0u;
    size_t size_in = 0u;
    std::tie(data_in, size_in) = input_tensor_buffer->data(std::vector<int>{0, 0, 0, 0});

    cv::Mat input_image = read_image(image_file_name);
    CHECK(!input_image.empty()) << "cannot load " << image_file_name;
    int8_t* data = (int8_t*)data_in;
    cv::Mat image2 = cv::Mat(height, width, CV_8SC3);
    cv::resize(input_image, image2, cv::Size(height, width), 0, 0, cv::INTER_NEAREST);
    for (int h = 0; h < height; h++) {
      for (int w = 0; w < width; w++) {
        for (int c = 0; c < 3; c++) {
        	float tmp = ((float)image2.at<cv::Vec3b>(h, w)[c]) - mean[c];
          	data[h*width*3+w*3 + c] = (int8_t) ( tmp * input_scale[0]); //in BGR mode
    	    //data_in[h*inWidth*3+w*3 +2-c] = (int8_t) ( tmp * input_scale[0]); //in RGB mode
        }
      }
    }
    auto v = runner->execute_async({input_tensor_buffer.get()}, {output_tensor_buffer.get()});
    auto status = runner->wait((int)v.first, -1);
    CHECK_EQ(status, 0) << "failed to run dpu";
    return generateHash(output_tensor_buffer.get(), output_scale[0]);
}

uint64_t DPU_hash(std::string ImagePath) {
	if(ImagePath == "") {
		std::cout << "--error!, image is empty--" << std::endl;
	}
	std::cout << "----start execute DPU for hashcode extraction-----" << std::endl;
    auto runner = InitDPU("./model/googlenet.xmodel");
    return runDPU(runner, ImagePath);
}

uint64_t generateHash (vart::TensorBuffer* tensor_buffer, float scale) {
	auto sigmoid_input = convert_fixpoint_to_float(tensor_buffer, scale);
	std::cout << "output size = " << sigmoid_input.size() << std::endl;
	float *sigmoid = new float[48];
	int i = 0;
	for(auto val : sigmoid_input) {
		sigmoid[i] = 1. / (1. + exp(-val));
		i++;
	}
	uint64_t hashcode = 0;
    for (int i=0; i<48; i++)
    {
    	hashcode = sigmoid[i]>0.5? (hashcode | ((uint64_t)1 << (63-i))): ((hashcode & (~((uint64_t)1 << (63-i)))));
    }
    printf("val = %#018"PRIx64"\n", hashcode);
    return hashcode;
}


static std::vector<float> convert_fixpoint_to_float(
    vart::TensorBuffer* tensor_buffer, float scale) {
  uint64_t data = 0u;
  size_t size = 0u;
  std::tie(data, size) = tensor_buffer->data(std::vector<int>{0, 0});
  signed char* data_c = (signed char*)data;
  auto ret = std::vector<float>(size);
  transform(data_c, data_c + size, ret.begin(),
            [scale](signed char v) { return ((float)v) * scale; });
  return ret;
}

accelerator.cpp

C/C++
invoke the graph search accelerator
#include "accelerator.h"

#include "time.h"

#define KG_REG_ADDR 0xA0000000
#define KG_GRAPH_ADDR 0X25000000
#define KG_RESULT_ADDR 0X3F000000

#define REG_SIZE 4096UL
#define REG_MASK (REG_SIZE - 1)
#define KG_SIZE    0x1000000
#define KG_MASK    KG_SIZE - 1


#define SLV_REG0_OFFSET (0*4)
#define SLV_REG1_OFFSET (1*4)
#define SLV_REG2_OFFSET (2*4)
#define SLV_REG3_OFFSET (3*4)

#define KGRAPH_FILE      "kv260_out_end.bin"
#define KGRAPH_SIZE       12800512
#define KGRAPH_RESULT_SIZE 2000

void *reg_vaddr, *kg_graph_vaddr, *kg_result_vaddr, *ptr;
unsigned int KGraph_Result[500];


void KGraph_Open()
{
	reg_vaddr       = (void *)get_vaddr(KG_REG_ADDR, REG_SIZE, REG_MASK);
	kg_graph_vaddr  = (void *)get_vaddr(KG_GRAPH_ADDR, KG_SIZE, KG_MASK);
	kg_result_vaddr = (void *)get_vaddr(KG_RESULT_ADDR, KG_SIZE, KG_MASK);

	printf("------kgraph weight map start--------\n");
    /********************weight map***************************/
	if(kgraph_memcpy(kg_graph_vaddr)==-1){
		printf("------kgraph weight map fail-----\n");
		exit;
		//return -1;
	}
	printf("------kgraph weight map end-----------\n");
}

void KGraph_Close()
{
    // unmap the memory before exiting
    if (munmap(reg_vaddr, REG_SIZE) == -1 || munmap(kg_graph_vaddr, KG_SIZE) == -1 || munmap(kg_result_vaddr, KG_SIZE) == -1) {
        printf("Can't unmap memory from user space.\n");
        exit(0);
    }

	printf("------Accelerator is end------------\n\r");
}

unsigned int * Run_KGraph(uint64_t hash_code)
{
	unsigned int* KGraph_Res = new unsigned int[500];
	//write_reg(SLV_REG1_OFFSET, 0xc7d8d870);
	//write_reg(SLV_REG2_OFFSET, 0x6c110000);

	write_reg(SLV_REG1_OFFSET, (uint32_t)(hash_code >> 32));  //00f937ae2973 train code googlenet
	write_reg(SLV_REG2_OFFSET, (uint32_t)(hash_code));  //7d31015c8ade

	write_reg(SLV_REG3_OFFSET, 0x000000C8);
	printf("------Accelerator start---------------\n");
	write_reg(SLV_REG0_OFFSET, 0x00000002);
	write_reg(SLV_REG0_OFFSET, 0x00000000);
	usleep(10000);
	int j = 0;
	while(1)
	{
		//printf("0x%08x\n", read_reg(SLV_REG0_OFFSET));
		if(read_reg(SLV_REG0_OFFSET) == 0x00000000)
		{
			printf("----acceleartor end------\n");
			break;
		}
	}
   	ptr = memcpy(KGraph_Res,kg_result_vaddr,KGRAPH_RESULT_SIZE);
	printf("result:");
	int i=0;
	for(i=0;i<100;i++) {
		printf("result=%d\n",KGraph_Res[i]);
	}

	return KGraph_Res;
}




int accelerator_init()
{
	reg_vaddr       = (void *)get_vaddr(KG_REG_ADDR, REG_SIZE, REG_MASK);
	kg_graph_vaddr  = (void *)get_vaddr(KG_GRAPH_ADDR, KG_SIZE, KG_MASK);
	kg_result_vaddr = (void *)get_vaddr(KG_RESULT_ADDR, KG_SIZE, KG_MASK);

	printf("------kgraph weight map start--------\n");
    /********************weight map***************************/
	if(kgraph_memcpy(kg_graph_vaddr)==-1){
		printf("------kgraph weight map fail-----\n");
		return -1;
	}
	printf("------kgraph weight map end-----------\n");

	printf("------input hash code and User K------\n");

	//write_reg(SLV_REG1_OFFSET, 0xc7d8d870);
	//write_reg(SLV_REG2_OFFSET, 0x6c110000);

	write_reg(SLV_REG1_OFFSET, 0x084eabf7);
	write_reg(SLV_REG2_OFFSET, 0x1d780000);

	write_reg(SLV_REG3_OFFSET, 0x000000C8);
	clock_t start = clock();
	printf("------Accelerator start---------------\n");
	write_reg(SLV_REG0_OFFSET, 0x00000002);
	write_reg(SLV_REG0_OFFSET, 0x00000000);
	int j = 0;
	while(1)
	{
		printf("0x%08x\n", read_reg(SLV_REG0_OFFSET));
		if(read_reg(SLV_REG0_OFFSET) == 0x00000000)
		{
			printf("----acceleartor end------\n");
			break;
		}
	}
	clock_t end = clock();
	printf("execution time=%f\n", (double)(end-start)/CLOCKS_PER_SEC);
	usleep(10000);
	printf("0x%08x\n", read_reg(SLV_REG0_OFFSET));
   	ptr = memcpy(KGraph_Result,kg_result_vaddr,KGRAPH_RESULT_SIZE);
	printf("result:");
	int i=0;
	for(i=0;i<100;i++) {
		printf("result=%d\n",KGraph_Result[i]);
	}
    // unmap the memory before exiting
    if (munmap(reg_vaddr, REG_SIZE) == -1 || munmap(kg_graph_vaddr, KG_SIZE) == -1 || munmap(kg_result_vaddr, KG_SIZE) == -1) {
        printf("Can't unmap memory from user space.\n");
        exit(0);
    }

	printf("------Accelerator is end------------\n\r");
	printf("\n");


}

void * get_vaddr(uint32_t BASE_ADDR, uint32_t SIZE,uint32_t MASK)
{
    printf("== START: AXI FPGA test ==\n");

    int memfd;
    void *mapped_base, *mapped_dev_base;
    off_t dev_base = BASE_ADDR;

    memfd = open("/dev/mem", O_RDWR | O_SYNC);
        if (memfd == -1) {
        printf("Can't open /dev/mem.\n");
        exit(0);
    }

    printf("/dev/mem opened.\n");

    // Map one page of memory into user space such that the device is in that page, but it may not
    // be at the start of the page.
    mapped_base = mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, memfd, dev_base & ~MASK);
        if (mapped_base == (void *) -1) {
        printf("Can't map the memory to user space.\n");
        exit(0);
    }
    printf("Memory mapped at address %p.\n", mapped_base);

    // get the address of the device in user space which will be an offset from the base
    // that was mapped as memory is mapped at the start of a page
    mapped_dev_base = mapped_base + (dev_base & MASK);

    return mapped_dev_base;
}

unsigned int write_reg(int reg_num, unsigned int reg_value){

	*((volatile uint32_t *) (reg_vaddr + reg_num)) = reg_value;

	return 0;
}

unsigned int read_reg(int reg_num){

	return *((volatile uint32_t *) (reg_vaddr + reg_num));
}


int kgraph_memcpy(void *ptr){
	FILE *file_in;
	file_in  = fopen(KGRAPH_FILE,"r");
	if(file_in==NULL){
		printf("open file_in error!\n");
		return -1;
	}
	printf("---------1--------------\n");
	char *w = (char*)malloc(KGRAPH_SIZE);
	if(!w)
	   printf("kgraph error\n");
	printf("---------2--------------\n");
	fread(w,1,KGRAPH_SIZE,file_in);
	printf("---------3--------------\n");
	clock_t start,end;
	start = clock();
	printf("---------4--------------\n");
	ptr = memcpy((ptr),w,KGRAPH_SIZE);
	printf("---------5--------------\n");
	free(w);
	end = clock();
	printf("kgraph weight memcpy success: time=%lf\n",(double)(end-start)/CLOCKS_PER_SEC);
	return 0;
}

main.cpp

C/C++
The web framwork code
/*
 * Copyright 2019 Xilinx Inc.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

#include <assert.h>
#include <dirent.h>
#include <stdio.h>
#include <stdlib.h>
#include <atomic>
#include <sys/stat.h>
#include <unistd.h>
#include <cassert>
#include <chrono>
#include <cmath>
#include <cstdio>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <queue>
#include <mutex>
#include <string>
#include <vector>
#include <thread>
#include <mutex>

#include "../include/accelerator.h"
#include "../include/crow_all.h"
#include "../include/dpu.h"

using namespace std::chrono;

const string baseImagePath = "/home/petalinux/cifar10/";
string query_image_path = baseImagePath + "batch1/7331.jpg";
vector<string> cifar10_file_path;

void init()
{
	ifstream infile_database;
	infile_database.open("/home/petalinux/cifar10/train.txt");

	string s;
	while(getline(infile_database, s))
	{
		s = s.substr(0, s.find(" "));
		cifar10_file_path.push_back(s);
	}
}

vector<string> run()
{
	uint64_t hash_code;
	vector<string> result_file_path;

	auto dpu_start = system_clock::now();

    hash_code = DPU_hash(query_image_path);

    auto dpu_end = system_clock::now();
    auto dpu_duration = (duration_cast<microseconds>(dpu_end - dpu_start)).count();
    cout << "[DPU Time]" << dpu_duration << "us" << endl;


    std::cout << "----DPU for hashcode extraction end-----" << std::endl;
    printf("hash_code = %#018"PRIx64"\n", hash_code);
    printf("hash_code = %"PRIx32"\n", (uint32_t)(hash_code>>32));
    printf("hash_code = %"PRIx32"\n", (uint32_t)(hash_code));

    unsigned int *Result_ID;

    auto kgraph_start = system_clock::now();

    Result_ID = Run_KGraph(hash_code);

    auto kgraph_end = system_clock::now();
    auto kgraph_duration = (duration_cast<microseconds>(kgraph_end - kgraph_start)).count();
    cout << "[KGraph Time]" << kgraph_duration << "us" << endl;

	for(int i=0;i<100;i++) {
		result_file_path.push_back(baseImagePath + cifar10_file_path.at(Result_ID[i]));
		printf("result=%d\n",Result_ID[i]);
	}
	delete[] Result_ID;

	return result_file_path;
}

int main(int argc, char **argv)
{
    printf("== START: AXI FPGA test ==\n");
    init();
    KGraph_Open();

    cout << "Hello World!" << endl;

    crow::SimpleApp app;
    crow::mustache::set_base(".");

    CROW_ROUTE(app, "/")
    ([]{
        crow::mustache::context ctx;
        return crow::mustache::load("./Web/a_test.html").render();
    });

    CROW_ROUTE(app, "/test")
    ([](const crow::request& /*req*/, crow::response& res){
        string key= "Access-Control-Allow-Origin";
        string value = "*";
        res.add_header(key,value);
        crow::json::wvalue x;
        vector<string> result;

        result = run();

        vector<string>::iterator it;
        int i=0;
        for(it = result.begin();it != result.end() ; it++)
        {
            string str = (*it).c_str();
            replace(str.begin(),str.end(),'/','+');
            x["img_path"][i] = str;
            i=i+1;
        }
        vector<string>(result).swap(result);
        res.write(crow::json::dump(x));
        res.end();
        //return crow::response(x);
    });
    CROW_ROUTE(app,"/add")
    ([](const crow::request& /*req*/, crow::response& res){
        std::ostringstream os;
        std::ifstream fin("img/thumbs/2-3.jpg",std::ios::binary);
        os << fin.rdbuf();
        res.set_header("Content-Type","image/jpeg");
        res.write(os.str());
        res.end();
    });
    CROW_ROUTE(app,"/img/<string>")
    ([](string a){
        replace(a.begin(),a.end(),'+','/');
        crow::response res;
        std::ostringstream os;
        std::ifstream fi_1(a,std::ios::binary);
        os << fi_1.rdbuf();
        res.set_header("Content-Type","image/jpeg");
        res.write(os.str());
        return res;
    });
    CROW_ROUTE(app,"/herf/<string>")
    ([](string a){
        crow::response res;
        std::ostringstream os;
        //std::ifstream fi_1("img/"+a,std::ios::binary);
        os << a;
        res.set_header("Content-Type","text/html");
        res.write(os.str());
        return res;
    });
    CROW_ROUTE(app,"/css")
    ([](){
        crow::response res;
        std::ostringstream os;
        std::ifstream fi_1("/home/petalinux/Web/css/baguetteBox.css",std::ios::binary);
        os << fi_1.rdbuf();
        res.set_header("Content-Type","text/css");
        res.write(os.str());
        return res;
    });
    CROW_ROUTE(app,"/style")
    ([](){
        crow::response res;
        std::ostringstream os;
        std::ifstream fi_1("/home/petalinux/Web/css/style.css",std::ios::binary);
        os << fi_1.rdbuf();
        res.set_header("Content-Type","text/css");
        res.write(os.str());
        return res;
    });
    CROW_ROUTE(app,"/box")
    ([](){
        crow::response res;
        std::ostringstream os;
        std::ifstream fi_1("/home/petalinux/Web/js/baguetteBox.js",std::ios::binary);
        os << fi_1.rdbuf();
        res.set_header("Content-Type","application/x-javascript");
        res.write(os.str());
        return res;
    });
    CROW_ROUTE(app, "/upload")
        .methods("GET"_method, "POST"_method)
    ([](const crow::request& req)
    {
        string tokens[6] ={"name=\"","\"; filename=\"","\"\r\n","Content-Type: ","\r\n\r\n","\r\n------WebKitFormBoundary"};
        int position[6];
        for(int i=0;i<6;i++)
        {
            position[i] = req.body.find(tokens[i]);
            //CROW_LOG_INFO << "position" <<i<<"="<<position[i];
        }
        string name = req.body.substr(position[0]+tokens[0].length(),position[1]-position[0]-tokens[0].length());
        string filename = req.body.substr(position[1]+tokens[1].length(),position[2]-position[1]-tokens[1].length());
        string ContentType = req.body.substr(position[3]+tokens[3].length(),position[4]-position[3]-tokens[3].length());
        string filecontent = req.body.substr(position[4]+tokens[4].length(),position[5]-position[4]-tokens[4].length());
        string final_string = req.body.substr(position[5]);
        query_image_path = "/home/petalinux/userupload/" + filename;
        /*CROW_LOG_INFO << "name=" <<name;
        CROW_LOG_INFO << "filename=" <<filename;
        CROW_LOG_INFO << "Content-Type=" <<ContentType;
        CROW_LOG_INFO << "final_string=" <<final_string;*/
        std::ofstream file(query_image_path, std::ios::binary);
        file.write(reinterpret_cast<const char*>(filecontent.c_str()),filecontent.length());
        file.close();
        return "aa";
    });
    CROW_ROUTE(app,"/icon")
    ([](){
        crow::response res;
        std::ostringstream os;
        std::ifstream fi_1("/home/petalinux/Web/icon.png",std::ios::binary);
        os << fi_1.rdbuf();
        res.set_header("Content-Type","image/jpeg");
        res.write(os.str());
        return res;
    });
    CROW_ROUTE(app,"/query_image")
    ([](){
        crow::response res;
        std::ostringstream os;
        std::ifstream fi_1("/home/petalinux/Web/Query_Image.png",std::ios::binary);
        os << fi_1.rdbuf();
        res.set_header("Content-Type","image/jpeg");
        res.write(os.str());
        return res;
    });
    app.port(50080)
        .multithreaded()
        .run();

    //DPU_close();
    KGraph_Close();

    printf("== STOP ==\n");

    return 0;
}

An-AI-driven-Search-by-Image-Engine-on-KV260

Credits

水木刀

水木刀

1 project • 2 followers
Xinyu Chen

Xinyu Chen

2 projects • 10 followers
PhD student at National University of Singapore.

Comments