There are a lot of topics that revolve around Artificial Intelligence that can seem complicated to grasp or are often over-explained, which makes the entire concept seem lost. When I was learning the different use cases of Machine Learning, Computer Vision was one of those incomprehensible concepts. It seemed like a field of AI that would be unattainable for me until I became an absolute expert. However, as I dove more into it and began to simplify the definition, it became apparent that Computer Vision was an attainable skill set, and its use cases were more prominent in my everyday life than I was aware of.
What is Computer Vision?
Computer Vision (CV) is a sector of AI that empowers computers and computer-based systems to extract valuable information from visual inputs. The optical inputs can range from images to full-length videos. The computer then makes decisions, takes necessary actions, or generates recommendations with these visual inputs. This outcome relies on what the CV application was programmed to do.
These computers or systems have a camera attached (think of a cellphone, traffic camera, or camera connected to a car) that can take photos or record videos to receive the visual insight and feed it to the algorithm.
While learning more about Computer Vision, I noticed it was sometimes used interchangeably with AI Vision which could be confusing. However, the most significant difference is that AI vision can be utilized as a stand-alone technology.
How Does CV Work?
Pattern recognition is the foundation of Computer Vision. First, machines/computers are trained, mainly by Machine Learning engineers, with loads of visual data and digesting hundreds and thousands of related images. Then, using convolutional neural networks (CNN) and deep learning, the machine can teach itself about the context of the inputted data. The goal of this portion is for the computer to use the data to decipher one image from another and categorize, tag, and label them accordingly.
The use of CNN in this process is to help the computers break these images down into pixels that are then given these said labels, tags, or categories. Next, these labels perform the convolutions and predict what it is "seeing." Then, the neural network runs this algorithm and checks the accuracy of the predictions in a series of iterations until the predictions begin to come true.
The CNN is used for images, while a similar process is used with recurrent neural networks (RNN) for videos to help machines comprehend how photos in a series of frames are related.
Real-World Examples of Computer Vision
Autonomous Vehicles: autonomous vehicles have cameras and sensors attached on all sides of them to enable driver assistance and autopilot features.
Theft, Surveillance, & Security: CV can track customer checkout theft and inventory misplacement, leading to theft. Rather it's a parking lot, bus stop, or public train, CV enhances security by using face recognition, crowd detection, human abnormal behavior detection, and plenty more.
Medical Field: CV is used for image classification and pattern detection every day in the medical field. This allows the doctor to generate a diagnosis based on the similarity of one image to known images that are diagnosed with that illness. For example, if a doctor believes you have lung cancer and takes photos of your lungs if your image is paired with several other images of cancer-infected lungs, the doctor now knows the possibility is high and can move forward with the diagnosis.
Agriculture: Planting and harvesting can be enhanced with the use of CV. Drones use CV to monitor crops, yield monitoring, automatic pesticide spraying, record weather patterns, or even enable security around the field.
Manufacturing: CV can inspect products to prevent breakdowns, scan for missing pieces, or search for safety hazards.
CV Applications You May Have Directly Come Across
Sports Broadcasting: IBM used CV to generate My Moments for the Masters Golf Tournament in 2018. After hours of footage was watched by IBM Watson, it was able to identify the sights and sounds of significant shots to curate these highlights on its own.
Google Translate: You now can just point your phone at a body of text in another language, and with the use of CV, you will immediately receive that translation. Rather its a food menu or a street sign, the translation can take place.
Face ID: If you use face id to unlock your phone or other means of accessing secure data within your device, you interact with CV every day.
Downfalls of Computer Vision
Anytime a new concept is being pushed into real-world applications, some downfalls and errors follow. For example, CV usually only completes tasks that it was explicitly programmed to execute; the CV application becomes useless if a new task is required to be completed, but the previous data isn't there to learn from.
For example, if a drone is implemented to watch out for birds and forest animals to protect the crops, but it was instead a car that drove off the road and into your crops, you wouldn't be alerted because the drone wasn't programmed to detect this type of error.
In addition to the lack of high-quality data, creating a machine that sees exactly as a human does is also just a difficult task. Its often a doctor who understands the human vision and neurological concepts, not a machine learning engineer, so it will take time for the knowledge to be acquired at the level that it is needed.
Conclusion
Computer Vision is a rapidly growing concept of Machine Learning and AI and has been able to evolve in various industries. While it does have its share of downfalls, engineers will develop with CV, and data will become more readily available and thorough to catapult CV use growth into an upward trajectory.
Did you find this article valuable?
Support Jarred Taylor by becoming a sponsor. Any amount is appreciated!