CHOSUN

High-Performance Ternary Content Addressable Memory Architecture for FPGAs

Metadata Downloads
Author(s)
인야앳 울라
Issued Date
2018
Keyword
field-programmable gate array (FPGA), SRAM-based TCAM, memory architecture
Abstract
Ternary content addressable memory (TCAM) allows searching of data using content instead of the address. It searches all stored data in parallel and outputs the matching address in a single cycle. Traditional TCAM designed in ASIC suffers from lower memory density, higher power consumption, expensive and offers limited configurability. Recently FPGAs have emerged as an attractive platform for implementing TCAMs using on-chip memory blocks. However, existing SRAM-based TCAM architectures on FPGAs suffers from the issues of lower memory efficiency, higher energy consumption, and high latency blocking updates. This thesis presents three separate approaches corresponding to the mentioned limitations in the existing FPGA-based TCAMs.
A multipumping-enabled multiported SRAM-based TCAM architecture, which achieves efficient memory utilization, is proposed. The simple dual-port BRAMs of the design are configured as multiported SRAM using the multipumping technique. It operates the SRAMs of the design with a higher internal clock frequency to access the sub-blocks of the BRAM in one system cycle. The independent shallow SRAM sub-blocks of the multiported SRAM, when used for implementing a TCAM table, achieve higher memory efficiency.
A pre-classifier based architecture for a low power SRAM-based TCAM is presented. The TCAM table is divided into several disjoint sub-tables in the first pre-classification stage, following with an SRAM-based implementation stage for each TCAM sub-table. At most one TCAM sub-table will be selectively activated for each incoming TCAM word, substantially reducing power consumption compared with existing TCAM designs without pre-classifier.
A technique for implementing a dynamically reconfigurable TCAM in SRAM-based FPGAs with a higher energy consumption and resource utilization efficiency is proposed. This implementation employs an FPGA's distributed RAM resources and the LUTRAMs of a SLICEM are implemented as quad-port RAM. It dynamically reconfigures only the LUTRAM associated with the word being updated, and at the same time allows search operations to be performed.
The proposed TCAM design methodologies are implemented on Xilinx Virtex-6 FPGA. Compared with existing FPGA-based TCAMs, our proposed multipumping-enabled multiported SRAM-based TCAM architecture achieves up to 2.85 times better performance per memory. The experimental results for the proposed pre-classification based TCAM architecture showed at least three times better power consumption per performance than other existing SRAM-based TCAM architectures. Compared with existing SRAM-based TCAMs, the proposed LUTRAM-based dynamically re-configurable TCAM architecture has a smaller single cycle search latency and achieves at least 2.5 times higher energy-delay product efficiency and a 67% higher performance per area. Thus, the proposed TCAM architectures showed better performance when compared with other existing FPGA-based TCAM architectures. |TCAM (Ternary Content-Addressable Memory)은 고속검색엔진을설계하는 데 사용됩니다. TCAM은특정용도의집적회로 (기본TCAM) 및 FPGA (정적 RAM (SRAM) 기반TCAM) 플랫폼에서구현됩니다. TCAM어플리케이션의검색공간요구는끊임없이증가하고있습니다. 그러나FPGA (Field-Programmable Gate Array)에 대한TCAM의기존구현은스토리지 비 효율성으로인해어려움을겪고있습니다. 네이티브TCAM과SRAM기반TCAM은모두높은전력소비의결점이있습니다. SRAM기반FPGA를사용하여설계된TCAM은유망한조회성능을제공합니다. 그러나TCAM테이블의업데이트프로세스는SRAM기반TCAM을효율적으로사용하기위해상당한어려움을낳습니다. FPGA 용 SRAM기반TCAM은이미대기시간이 긴 업데이트작업중에검색작업을일시중지하므로고주파업데이트가필요한응용프로그램에서는실행이불가능합니다. 이 논문은기존의FPGA기반TCAM, 즉 스토리지비효율, 높은전력소비, 높은대기시간의블로킹업데이트에서언급 된 한계를극복하기위한 세 가지개별적인접근방법을제시한다.

SRAM메모리의효율적인활용을달성하기위한FPGA상의멀티펌핑가능멀티포트SRAM기반TCAM설계가제시된다. 기존의TCAM 용 SRAM기반솔루션은전통적인TCAM패턴폭의증가가메모리사용량의지수증가에서FPGA의계단식블록 RAM (BRAM)을 사용하는선형증가에미치는영향을줄입니다. 그러나최첨단FPGA의BRAM은최소깊이제한이있어TCAM비트의저장효율성을제한합니다. 제안 된 솔루션은기존의TCAM테이블영역을구성된BRAM의얕은하위블록에매핑하여메모리효율적인TCAM메모리설계를구현함으로써이러한한계를피합니다. 제안 된 솔루션은단일시스템사이클에서BRAM의하위블록에액세스하기위해보다높은내부클록주파수로클러킹함으로써다중펌핑기술을사용하여멀티포트SRAM으로구성된간단한듀얼포트BRAM을작동시킨다. 우리는제안 된 디자인을버텍스 -6 FPGA디바이스에구현했다. 기존의FPGA기반TCAM설계와비교하여제안 된 방법은메모리 당 최대 2.85 배의성능을달성합니다.

저전력SRAM기반TCAM을위한사전분류기기반아키텍처가제시된다. 제 1 분류스테이지는TCAM테이블을균형잡힌크기의 몇 개의서브 - 테이블로분할한다. 두 번째SRAM기반구현단계는결과TCAM하위테이블각각을아키텍처에서구성된SRAM블록의개별행에매핑합니다. 제안 된 아키텍처는들어오는TCAM워드마다최대하나의SRAM블록행을선택적으로활성화합니다. 기존의SRAM기반TCAM디자인과비교할 때 제안 된 설계는이전스키마와같이전체SRAM메모리가아닌룩업을위해사용되는SRAM메모리의일부를활성화하므로에너지소비를크게줄인다. 우리는제안 된 접근방식을XilinxVirtex-6 FPGA에구현했습니다. 실험결과에따르면제안 된 설계는다른SRAM기반TCAM아키텍처보다성능 당 최소 3 배 낮은전력소비를달성했다.

FPGA기반의동적업데이트가가능한에너지 및 자원효율적인TCAM설계 (DURE). DURE는FPGA의분산 RAM 리소스를활용합니다. 보다구체적으로SLICEM리소스에서사용가능한룩업테이블 RAM (LUTRAM)은 DURE구현시기본메모리 (BM) 블록을구성하는쿼드포트RAM으로구성됩니다. TCAM테이블의내용은동일한크기의청크로분할되고제안 된 BM 블록의LUTRAM에매핑됩니다. DURE는업데이트되는단어와연관된 BM 블록의LUTRAM 만 재구성하여동적업데이트를구현하므로검색 및 업데이트작업을동시에수행 할 수 있습니다. 기존의SRAM기반TCAM과비교하여DURE는단일사이클검색대기시간이짧으며에너지지연제품효율이 2.5 배 이상향상되고면적 당 67% 더 높은성능을달성합니다.

이 논문에서는보다 높은 메모리 효율, 더 나은 에너지 지연 제품 효율을 달성하고 동적 업데이트를 구현하는 세 가지 고성능 TCAM 아키텍처를 발표했습니다. 우리의 제안 된 솔루션은 일반적이며 많은 어플리케이션에 적용될 수 있습니다. 따라서 앞으로의 평가를 위해 다양한 애플리케이션에 이러한 디자인을 채택하는 작업이 포함됩니다.
Alternative Title
FPGA 기반 고성능 TCAM 구조
Alternative Author(s)
Inayat Ullah
Affiliation
Department of Computer Engineering
Department
일반대학원 컴퓨터공학과
Advisor
이정아
Awarded Date
2019-02
Table Of Contents
I. INTRODUCTION . . . . . . . . . . . . . . . . . .1
A. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
B. Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 3
C. Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
D. Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . 7
E. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
F. Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . 11

II. MULTIPUMPING ENABLED MULTIPORTED SRAM-BASED TCAMARCHITECTURE 13
A. Multipumping-Enabled Multiported SRAM . . . . . . . . . . . 13
1. Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . 14
2. Proposed Partitioning of Traditional TCAM Table . . . 16
3. Basic Architecture of the Proposed TCAM Memory . . 17
4. Modular Architecture . . . . . . . . . . . . . . . . . . . 19
5. Effect of Multipumping SRAM on the Memory Usage and Throughput . . . . . . . . . . . . . . . . . . . . . . 20
B. Implementation Setup and Results . . . . . . . . . . . . . . . . 20
C. Performance Evaluation & Comparison . . . . . . . . . . . . . 21
1. SRAM Memory Utilization . . . . . . . . . . . . . . . 22
2. Throughput . . . . . . . . . . . . . . . . . . . . . . . . 23
3. Performance per Memory . . . . . . . . . . . . . . . . 24

III.PRE-CLASSIFICATION- BASED ENERGY-EFFICIENT SRAMBASEDTCAMARCHITECTURE 28 A. Proposed Classification Scheme . . . . . . . . . . . . . . . . . 28
A. Proposed Classification Scheme . . . . . . . . . . . . . . . . . 28
B. EE-TCAM Proposed Architecture . . . . . . . . . . . . . . . . 34
C. EE-TCAM FPGA Implementation & Results . . . . . . . . . . 37
D. EE-TCAM Performance Evaluation & Comparison . . . . . . . 38
1. Scalability of EE-TCAM . . . . . . . . . . . . . . . . . 38
2. Performance Trade-Off with Increase in the Number of TCAM Sub-Tables (M) . . . . . . . . . . . . . . . . . . 41
3. Power Consumption . . . . . . . . . . . . . . . . . . . 43
4. Power Consumption Per Performance Comparison . . . 44

IV. DYNAMICALLY RE-CONFIGURABLE ENERGY- AND RESOURCE- EFFICIENT TCAM ARCHITECTURE FOR FPGAS 49
A. Hardware Architecture of Proposed TCAM: DURE . . . . . . . 49
1. Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . 49
2. Building Blocks of DURE on FPGA . . . . . . . . . . . 49
3. Architecture of the Proposed DURE . . . . . . . . . . . 53
4. Dynamic Update . . . . . . . . . . . . . . . . . . . . . 55
B. FPGA Implementation and Results . . . . . . . . . . . . . . . . 56
C. DURE Performance Evaluation and Comparison . . . . . . . . 58
1. Scalability of DURE . . . . . . . . . . . . . . . . . . . 58
2. Search Latency . . . . . . . . . . . . . . . . . . . . . . 64
3. Update Rate . . . . . . . . . . . . . . . . . . . . . . . . 65
4. FPGA Resource Utilization . . . . . . . . . . . . . . . 66
5. Performance per Area . . . . . . . . . . . . . . . . . . 69
6. Energy-Delay Product . . . . . . . . . . . . . . . . . . 74

V. CONCLUSIONS & FUTURE WORK 79

REFERENCES 81

ACKNOWLEDGEMENTS 90
Degree
Doctor
Publisher
Chosun University, Graduate School of Engineering
Citation
인야앳 울라. (2018). High-Performance Ternary Content Addressable Memory Architecture for FPGAs.
Type
Dissertation
URI
https://oak.chosun.ac.kr/handle/2020.oak/13713
http://chosun.dcollection.net/common/orgView/200000267064
Appears in Collections:
General Graduate School > 4. Theses(Ph.D)
Authorize & License
  • AuthorizeOpen
  • Embargo2019-02-08
Files in This Item:

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.