Artificial Intelligence Unpacked

Artificial Intelligence is a big subject. To better understand the field and the technology we unpack the various areas where AI is applied.

Sponsored by Wevolver

Today we are witnesses of numerous applications of artificial intelligence, starting from the pattern recognition (such as vision, speech recognition, fraud detection), moving to intelligent behaviour (learning, cognition, recommender systems), and ending with autonomous and cognitive systems (cars, robots etc.).

Written by IT Analyst Zoran Gacovski for Wevolver.

Under intelligence we normally mean the ability to acquire, memorize, and process certain knowledge. A person (or machine) without any knowledge cannot be considered intelligent. Also, a person(or machine) cannot be considered as intelligent with huge amounts of "static" knowledge, but without the ability to process that knowledge to solve problems related with it. The ability of learning - acquisition of new knowledge, is also one of the aspects of the intelligence, although we can classify it as an ability to solve problems. As an “intelligent feature” we also consider the ability to communicate with other intelligent beings (machines), which can also be classified as problem-solving.

A definition given by Tom Mitchell in 1998 states that - the computer program is said to learn from experience, if its performance on the task T, measured by metrics P, will be improved by the experience E.

Algorithms and Methods

The challenge in AI and machine learning is how to accurately (algorithmically) describe some kinds of tasks that people can easily solve (for example, face recognition, speech recognition, etc.). Such algorithms can be defined for certain types of tasks, but they can be very complex and/or require a large knowledge base:

  • Information retrieval - IR, is finding existing information as quickly as possible. For example, web crawlers quick find of information within the (large) set of the entire WWW.
  • Machine Learning - ML, is a set of techniques that generalize existing knowledge from new information, as precisely as possible. An example is speech recognition. Deep learning is a subset of ML, on which we will expand later in this article.
  • Data mining - DM, primarily relates to the disclosure of something hidden within the data, some new dependence, which have not previously been known (e.g., customer analysis).

There are two types of machine learning:
1. Supervised learning: classification and regression.
2. Unsupervised learning: clustering, generative models, etc.

In supervised learning - the algorithm (e.g., decision tree, random forest) is provided data from which it learns, and the desired outputs. The algorithm has to learn - to provide the output from the given dataset. Thus, the software program that learns must obtain:

- one set of input parameters (x1, x2, ..., xn), and
- one set of desired / correct values, so that for each input xi, we have the desired / correct output yi.

The task of the program is to "learn" how to assign new, unmarked inputs to the correct output value. The output value can be:

- a label (nominal value) - it is a classification.
- a real number - it is a regression.

Supervised learning example (neural network) is the case of predicting the price of real estate based on their surface area. The learning data set will comprise the market data about units’ prices and their surface.

Other examples of supervised learning (classification) include:

  • Pattern Recognition (vison).
  • Face recognition: pose lighting, occlusion (glasses, beard), makeup, hairstyles.
  • Identify characters: printed, handwritten.
  • Speech recognition: timely inter-dependence parameters of speech.
  • Medical diagnostics: from symptoms to diseases.
  • Biometrics: user identification/ authentication using physical characteristics or behavior: face, iris, signature, fingerprint, gait.

The unsupervised learning algorithm (K-means, clustering) is provided only with data, and not with a desired output. The algorithm must recognize some dependencies in the data that is given to it. So, unsupervised means that:

  • We have no information about the desired output value.
  • The software receives only a set of input parameters (x1, x2, ..., xn).
  • The task of the program is to reveal hidden structures/dependencies among the data.

An example of unsupervised learning (grouping) is to determine clothing sizes based on the height and the weight of the people.

Deep Learning

Deep learning is a branch of Machine Learning based on complex data representations - at a higher degree of abstraction, which are obtained by a chain of learned nonlinear transformations. Deep learning methods are usually applied in important areas of artificial intelligence such as computer vision, natural language processing, speech, and sound comprehension, as well as in bioinformatics. This learning is based on advanced discriminative and generative deep models with particular emphasis on practical implementations. The key elements of deep learning are the classical neural networks, their building elements, regularization techniques, and deep model-specific learning methods. The other approaches involve deep convolution models that can be applied in image classification and natural language processing. The third segment is the generative deep models and their applications in machine vision and natural language understanding. All these techniques can lead to sequence modeling by deep feedback neural networks and be applied in the field of robotics and self-driving cars. Deep learning methods can be implemented using modern dynamic languages ​​(Python, Lua, or Julia). Furthermore, there are modern deep learning application frameworks (e.g. Theano, Tensor-flow, Torch).

The State of AI in Different Subject Areas

Vision

Machine vision (artificial sight) is the ability to recognize images and to understand what is seen. It involves digital cameras, analog to digital conversion, and digital signal processing. After the image is taken - the particular steps within machine vision include:

  • Image processing - stitching, filtering, pixel counting.
  • Segmentation - partitioning the image into multiple segments to simplify and/or change the representation of the image into something that is meaningful and easier to analyze.
  • Blob checking - inspecting the image for discrete blobs of connected pixels (e.g. a black hole in a grey object) as image landmarks. These blobs frequently represent optical targets for observation, robotic capture, or manufacturing failure.
  • Pattern recognition algorithms including template matching, i.e. finding and matching with specific patterns using some ML method (neural network, deep learning, etc.). Re-positioning of the object may be required, or varying in size.

Speech Recognition and Synthesis

As with all speech technologies, speech recognition is a multidisciplinary problem that requires knowledge in many fields, ranging from acoustics, phonetics, and linguistics to mathematics, telecommunications, signal processing, and programming. A particular issue is the fact that the problem is extremely language-dependent.

The task of automatic speech recognition (ASR) is to obtain an appropriate text record based on the audio input of a speech unit (words or sentences). In this way, the speech is practically converted to text, that is, "recognizing" what a particular speaker has said.

We distinguish between ASR systems that recognize isolated words from systems that can recognize whole sentences as well. ASR systems can also be sorted by vocabulary (number of words they can recognize), whether they recognize only fixed, predefined words or are phonetic-based (recognize individual voices), and whether they are dependent or independent from the speaker. The main AI (machine learning) methods for speech recognition are the Hidden Markov Model and the Naïve Bayesian Classifier.

Text-based Speech Synthesis (TTS) is the oldest speech technology, and its beginnings go back to the 19th century when the first "talking machines" appeared. In the meantime, there has been a dramatic development in this area, thanks to the development of computer technology in recent decades. The task of speech synthesis is to generate a human-alike speech signal based on text input. This also implies that synthesized speech should sound natural, i.e. should have an intonation characteristic of natural human speech.

Fraud Detection

In today’s digital world - frauds can appear everywhere - payment cards fraud, online and e-banking fraud, mobile apps fraud, payment transaction fraud, savings, and credit card fraud. By using proper AI and ML methods - many of these frauds can be detected and prevented.

The AI solutions integrate with authorized (banking, e-commerce) systems and with the help of built-in client profiling and machine learning capabilities, enable efficient monitoring of transactions and real-time blocking of those assessed as high-risk. The AI solutions can enable effective anti-fraud tools, that will significantly reduce risk exposure and operational losses, and improve the operational efficiency of fraud prevention teams (protecting the financial institution's reputation). If done properly, machine learning can clearly distinguish legitimate and fraudulent behaviors while adapting over time to new, previously unseen fraud tactics.

To develop highly reliable anti-fraud solutions - we must consider both supervised and unsupervised learning methods as a cohesive strategy, and to apply behaviour analytics.

Recommender Systems (Data Analytics)

With the advent of online shopping websites (Amazon, Netflix, AliExpress, Booking, TripAdvisor), there has been a need to provide recommendations - instead of listing the full range of products available. In line with the users' needs, the system (recommendation algorithm) generates recommendations using different sets of user’s data (previous transactions stored in the database). When getting these recommendations, users can either accept or reject them - thus providing feedback for the later stages.

The main algorithms used in recommender systems include market-basket analysis, association rules, nearest neighborhood, Boltzmann machines, etc. There are 6 main types of recommender systems:

  • Content-based - the system recommends similar items with the items previously purchased by the user.
  • Collaboration filtering - the system recommends items purchased by customers with similar profiles (parameters) as the current user.
  • Demographic-based - recommendations based on user’s demographic characteristics (age, sex, location - address, religion, etc.).
  • Knowledge-based - these are case-based systems; they ask the user for which purpose (goal) the product (service) will be used (e.g. “do you like thriller movies?” or, “do you like sea-view room?”) and then make proper recommendations.
  • Community-based - tell me who your friends are, and I’ll tell you who you are. This is the approach in these systems - they offer similar products as those previously purchased by your friends (network).
  • Hybrid systems - a combination of two or more previously listed approaches.

Cognition

IBM developed the IBM Watson Cognitive Computer, which is applicable in all areas - from making the most complex business decisions - to the daily activities of the masses. Aside from its many abilities - Watson has won in the US Quiz TV show Jeopardy. It's one thing to learn a supercomputer to play chess, and something else to understand the complex, difficult and complicated strands of English sentences full of synonyms, slang, and logic, and to give the correct answer. Today, Watson is even more advanced. Some of its services are completely free of charge and can be implemented everywhere, even in the smallest startups. It is also used in cooking, where the Chef Watson app helps chefs create new recipes. The point is that nothing is programmed. Best of all - after being "involved" with thousands of recipes, Chef Watson himself figures out which foods, spices, and other things go best and mix with one another. And he continues to study alone. It can also be a weather forecaster, airplane controller (pilot), chat-bot and much more.

The purpose of the cognitive systems developed and implemented by IBM is to extend human intelligence. Their technology, products, services, and policies will be designed to enhance and extend human capacity, expertise, and potential. Their attitude is based not only on principles but also on cognitive science. In IBM they say: "Cognitive systems will not realistically reach consciousness or independent activity. Instead, they will increasingly be embedded in the processes, systems, products, and services through which business and society function - all of which will and should remain within human control.”

Integrated Systems

Autonomous Driving

Autonomous vehicles, also known as robotic vehicles or self-driving vehicles, are motor vehicles that can move independently (i.e. without driver/human assistance), so that all real-time driving functions are being transferred to the so-called Vehicle Automation System. This type of vehicles have the ability to perform all the steering and movement functions otherwise performed by a human being, can detect - see the traffic environment, while the "driver" only needs to choose a destination and doesn’t have to perform any operation while driving.

An autonomous vehicle can be operated independently by video cameras, radar sensors and laser range-finders, which can also "see" other "road users", as well as download detailed maps. Google's street view data allows the car to plan its route by getting ahead the roadmaps and intersections. The vehicle records the information it collects using ultrasonic sensors and cameras constantly from the environment. By processing images from video cameras, the autonomous vehicle control system detects the position of the vehicle with respect to the marked lines on the road asphalt, and in cooperation with other sensor systems - the control system determines the distance to the surrounding vehicles, as well as its relative speeds.

Humanoid Robotics

The near future will bring us robots that are closely related to us, who can move, communicate, think and feel like humans. For example, Asimo a humanoid robot designed and developed by Honda, introduced on October 21, 2000. He is 130 cm tall and weighs 48 kg. It was designed to work in real environments. Asimo can walk as much as 6 km an hour, it can take and give items, it can handshake, can step forward or backward to a certain level. It can understand and speak certain instructions and exchange simple sentences with people (can make about 50 different calls/greetings, and it responds to 30 commands). He is often used in demonstrations around the world to teach science and math.

In today’s terms - Asimo’s features are quite modest, meaning that the new humanoid robots (e.g. Sophia robot, Hanson Robotics, 2016) have flawless speech recognition, they can talk (chatbot), can recognize individuals (camera vision) and make their own decisions (cognition). Sophia’s purpose is to be a suitable companion for the elderly at nursing homes, or to help crowds at large events or parks. Sophia is also the first robot that possesses a passport and citizenship.

Conclusion

The concept of Artificial Intelligence was introduced in the 50s’ by John McCarthy and Marvin Minsky (MIT). Since then numerous fields have emerged (e.g. fuzzy logic, neural networks, Bayesian classifiers, etc.), together with proper applications (speech recognition, machine vision, autonomous robotics).

The field of AI is booming nowadays due to large advancement in processing power (multi-core and parallel processors), as well as new software tools (logic programming, big data, Python).

In the areas such as machine learning we are witnesses of new paradigms such as Big data and Deep learning that work over huge training data-sets and which are based on complex data representations - at a higher degree of abstraction.

In the areas of pattern recognition (machine vision) - aside from the traditional supervised learning (e.g. neural nets), there is the 'hot' field of deep learning neural nets, which statistically create their own rules by training on a data set of tagged images.

AI holds both great promises as well as posessing potential risks. As the technology keeps evolving its areas of application will continue to expand. Therefore, expect us to revisit the topic regularly.

This artivle was written by IT Analyst Zoran Gacovski, and first published on Wevolver.

Latest articles
Related articles
Latest articles
Read more
Related articles