I spent an entire day last weekend devoted to setting up OpenCV on a RaspberryPi. With most software it's as simple as an apt-get, grab a drink, and you're done. OpenCV is definitely the most involved suite of software I've ever had to install.
I had originally intended to do most of the coding on my desktop and then simply transfer the completed py file to the Pi when ready. After discovering the difficulties in setting up OpenCV and the vast difference in setting it up on Windows vs Linux I opted to just do the whole project on the Pi. I would eventually need to set it up on the Pi in any case so it could function in the end result so setting it up on my desktop would just add that much more headache.
Installation on the Pi went pretty smoothly thanks to this guide, although I did end up with errors using threading during the make so I had to use a single core that took ~8 hours to build. With the install complete I took a disk image of the SD card using Win32 Disk Imager for backup. With this, if I messed up too badly I wouldn't have to re-perform the day-long installation. This was a lesson I learned after several previous Pi projects.
Day 1 (4/28/17)
Time to start learning about OpenCV. Thankfully there are a lot of informative tutorials around so I started with the basics. I've jumped into projects before where I skip to chapter 8 where the data I need is but I have such a lacking in the fundamentals that I end up in a sticky situation. This time I wanted to at least read most of the basics and try them out to give myself a good understanding of what the software can and can't do.
Some aspects of the software are very easy to do powerful tasks such as edge detection and by the end of Day 1 I thought I had made some pretty decent progress. If I set the card perfectly square, I was able to detect the card 1 in 10 times. If it detected the card, it was able to trim the card to the desired title text box pretty readily (because it was perfectly square) then scan the text. The scanning (via PyTesseract) would get the totally right name 1 in 20 times and a pretty close name 1 in 5 times. Definitely not perfect but I thought I had made some pretty large strides.
Throughout the evening as I started pondering how the software I'd made would fit with the overall system I realized that having to set the card perfectly square would be rather restrictive. It would be much better if I could deal with a slightly misaligned card, not totally skewed but a bit from perfect. I resolved that on day 2 I would focus more on how to detect the card itself and then from that card dive down deeper into the various parts.
Day 2 (4/29/17)
Back to basics. Time to just find the card on the white sheet of paper. Since I'm the end user and can control the method of image capture I don't need to plan for all possible backgrounds or orientations so I used some assumptions to make things easier. I was able to bilaterally blur the image, perform threshold detection, and then find the contours of the outer edge by the end of a couple hours. This is where the difficulty starts.
It seemed that OpenCV had the perfect method already present for me, warpPerspective which would turn a skewed image into a rectangular one as long as you know the corners of the original and desired image shapes. The desired image shape is a standard card shape and I had detected the contours of the original image so I thought I was golden. As it turns out though, it doesn't always detect the contours as a rectangle, sometimes it uses multiple curves that look like a rectangle but have more vertices. I used minAreaRect to find the minimum bounding rectangle but since the card perspective was slightly skewed it resulted in a bit of extra area captured too. The problem always went back to the issue that if I used minAreaRect I captured extra and there was no benefit to warping the perspective, but finding the four vertices of an arbitrary set of curves that may or may not be a rectangle is a substantial endeavor in its own right.
From here I decided to throw my weight around since, again, I'm the end user. If I bake one more assumption into the software it makes the problem easier. I will assume that the card and camera are always normal to each other and reasonably centered. I can control this with my machine design relatively easily and it simplifies the problem. Now the card can be rotated and slightly off center but I don't need to worry about skew. I can select the minAreaRect, rotate the picture by the angle of that rectangle, then crop the picture and I'll have my card. With my test images this seemed to work pretty reasonably although because my cards were not precisely normal there's still a small bit of skew and extra whitespace on the side of the images.
So, now I have a program that can find the card and crop a reasonable image of it given the following assumptions:
The card is the only thing on a white background
The card is normal to the camera and reasonably centered
The card is mostly pointing up (anywhere from 3 to 12 to 9 on a clock)
Below are three of the test images that I worked with followed by the scanning results at various stages of manipulation.
The next challenge is to find the title block itself to extract just the text. I had hoped that by finding the card image I would be able to just extract the top n pixels, trim a bit on the sides, and be set. However, due to the slight misalignments allowed by the minAreaRect and white space I can't be as precise with the cropping as I'd need to be. As you can see in the detected card images above, the final card is slightly skewed a bit differently in each case. My next best guess is that I will need to dive back into contours and somehow trim all those outside the title block but I'm not yet sure how to do that.
Machine vision is easy to do powerful tasks but it's hard to do exactly what you want, especially when it seems so intuitively simple to a human. Ah, well, Rome wasn't built in a day or some such. Cheers!