OCR 文字光学识别
问题描述
- 文字检测
- 文字识别
文字检测(localization)
特点
- IOU不是一个好的criterion, 检测到一部分文字也行
- various fonts, colors, languages and bks etc.
- perspective transformation
- layouts
- word/line level
methods
- tranditional method
- deep learning
- RPN based, detection
- FCN based, segmentation
Note: Sence text detection via holistic, multi-channel prediction
文字识别
method
- CNN/MDLSTM + RNN + CTC
- Sequence to Sequence with Attention
- Combine CTC and Attention
note: CTC用来将文字进行对齐
note: LSTM -> GRU ->EURNN
RNN
- Bidirectional RNN (文字识别中经常使用)
- Stack RNN(百度 7个堆叠, 谷歌5个堆叠)
- MDLSTM/Grid LSTM
challenge
- Chinese include too many characters
- Uncoutable labels
- Insufficient data (Synthesize)
- Much more computation
- Incaptable
- Too much perspective transform (STN)
- Vertical layout