Hybrid Genetic Relational Search for Inductive Learning
MetadataShow full item record
An important characteristic of all natural systems is the ability to acquire knowledge through experience and to adapt to new situations. Learning is the single unifying theme of all natural systems. One of the basic ways of gaining knowledge is through examples of some concepts.For instance, we may learn how to distinguish a dog from other creatures after that we have seen a number of creatures, and after that someone (a teacher, or supervisor) told us which creatures are dogs and which are not. This way of learning is called supervised learning. Inductive Concept Learning (ICL) constitutes a central topic in machine learning. The problem can be formulated in the following manner: given a description language used to express possible hypotheses, a background knowledge, a set of positive examples, and a set of negative examples, one has to find a hypothesis which covers all positive examples and none of the negative ones. This is a supervised way of learning, since a supervisor has already classified the examples of the concept into positive and negative examples. The so learned concept can be used to classify previously unseen examples. In general deriving general conclusions from specific observation is called induction. Thus in ICL, concepts are induced because obtained from the observation of a limited set of training examples. The process can be seen as a search process. Starting from an initial hypothesis, what is done is searching the space of the possible hypotheses for one that fits the given set of examples. A representation language has to be chosen in order to represent concepts, examples and the background knowledge. This is an important choice, because this may limit the kind of concept we can learn. With a representation language that has a low expressive power we may not be able to represent some problem domain, because too complex for the language adopted. On the other side, a too expressive language may give us the possibility to represent all problem domains. However this solution may also give us too much freedom, in the sense that we can build concepts in too many different ways, and this could lead to the impossibility of finding the right concept. We are interested in learning concepts expressed in a fragment of first--order logic (FOL). This subject is known as Inductive Logic Programming (ILP), where the knowledge to be learn is expressed by Horn clauses, which are used in programming languages based on logic programming like Prolog. Learning systems that use a representation based on first--order logic have been successfully applied to relevant real life problems, e.g., learning a specific property related to carcinogenicity. Learning first--order hypotheses is a hard task, due to the huge search space one has to deal with. The approach used by the majority of ILP systems tries to overcome this problem by using specific search strategies, like the top-down and the inverse resolution mechanism. However, the greedy selection strategies adopted for reducing the computational effort, render techniques based on this approach often incapable of escaping from local optima. An alternative approach is offered by genetic algorithms (GAs). GAs have proved to be successful in solving comparatively hard optimization problems, as well as problems like ICL. GAs represents a good approach when the problems to solve are characterized by a high number of variables, when there is interaction among variables, when there are mixed types of variables, e.g., numerical and nominal, and when the search space presents many local optima. Moreover it is easy to hybridize GAs with other techniques that are known to be good for solving some classes of problems. Another appealing feature of GAs is represented by their intrinsic parallelism, and their use of exploration operators, which give them the possibility of escaping from local optima. However this latter characteristic of GAs is also responsible for their rather poor performance on learning tasks which are easy to tackle by algorithms that use specific search strategies. These observations suggest that the two approaches above described, i.e., standard ILP strategies and GAs, are applicable to partly complementary classes of learning problems. More important, they indicate that a system incorporating features from both approaches could profit from the different benefits of the approaches. This motivates the aim of this thesis, which is to develop a system based on GAs for ILP that incorporates search strategies used in successful ILP systems. Our approach is inspired by memetic algorithms, a population based search method for combinatorial optimization problems. In evolutionary computation memetic algorithms are GAs in which individuals can be refined during their lifetime.