A recurring connection model to form personal impressions
Frank Van Overwalle, Department of Psychology
Free University of Brussels, Belgium
Christophe LabiouseBelgian NFSR researcher and Institute of Psychology
University of Liège, Belgium
Key insights into impression formation are viewed and modeled from a coherent perspective. Findings lie in the areas of priority and timeliness in impression formation, asymmetric diagnosis of ability and morality-related traits, increased recall of trait-inconsistent information, assimilation and contrast in priming and discounting inferences through situational information. Most of these phenomena are illustrated with known experiments and simulated with an auto-associative network architecture with linear activation update and using the delta learning algorithm to adjust the connection weights. All simulations successfully reproduced the empirical results. In addition, the proposed model has been shown to be consistent with previous algebraic models of impression formation (Anderson, 1981; Busemeyer, 1991; Hogarth & Einhorn, 1992). The discussion revolves around how our model compares to other connectionist approaches to impression formation and how it can contribute to a more parsimonious and unified theory of person perception. .
Getting to know others socially is often about drawing conclusions about the characteristics and character traits of the individual. This is essential for social thinking because it offers the possibility to go beyond the specific behavior of the individual and to generalize to similar events in the future and to similar people. How do we extract character traits from observing a person's behavior? How is this information processed and stored in memory? The aim of this article is to gain insight into this process by applying a computational modeling perspective - specifically a connectionist approach - to these questions.
Recently, the dominant view of the process of impression formation in social psychology has been that humans are intuitive statisticians who extract behavioral information by applying certain rules to them. Many of these models rely on an algebraic function to transform social information into abstract models. Models of this type include Anderson's (1981) weighted average model, Hogarth and Einhorn's (1992) stepwise belief adjustment model, and Busemeyer's serial averaging strategy. (1991). Although supported by an impressive amount of empirical data
Technical research, the most popular model developed by Anderson, has been criticized for lacking psychological plausibility since it seems unlikely that people would go through all the necessary ways and means in their heads to arrive at an impression, and for this reason many researchers have given up . Algebraic models in general.
The main goal of this work is to revive computational modeling of impression formation by introducing a coherent framework to describe observers' internal computations. This framework is loosely based on how the brain works—neurons relay activation to other neurons and develop synaptic connections between each other—to develop computational models of human thought. Previous attempts to apply the brain metaphor to social psychology used neuron-like representation and activation propagation as the main principles (Hamilton, Katz, & Leirer, 1980; Hastie & Kumar, 1979), but failed to formalize these ideas computationally. The increasing success of connectionist models in cognitive psychology, which allow for more accurate computational implementations, has prompted a number of authors to turn to these models to develop connectionist models for various social psychological phenomena, including causal attribution (Read & Montoya, 1999; Van Overwalle, 1998), cognitive dissonance (Shultz & Lepper, 1996; Van Overwalle & Jordens, 2002) and formation of group impressions and
Personality and Social Psychology Review2004, Bd. 8, Nr. 1, 28-61
Copyright © 2004 van Lawrence Erlbaum Associates, Inc.
This research was supported by grant OZR423 from the Vrije Universiteit Brussel to Frank Van Overwalle. We thank BobFrench for his suggestions and comments on a previous version of this article.
Requests for reprints should be directed to Frank Van Overwalle, Department of Psychology, Vrije Universiteit Brussel, Pleinlaan 2, B-1050, Brussels, Belgium. E-mail:[email protected]
forandring (Kashima, Woolcock & Kashima, 2000; VanRooy, Van Overwalle, Vanhoomissen, Labiouse & French, 2003).
Connectionist models were also developed in imprint formation. Kunda and Thagard (1996) developed a parallel constraint satisfaction model that described how social stereotypes and individual information limited each other's meaning and collectively influenced individuals' impressions. However, this model was static and lacked a learning mechanism by which new impressions or stereotypes of groups and individuals could be developed and memorized. This deficiency was corrected in the tensor product model of impression formation by Kashima and Kerekes (1994), which showed different results regarding primacy and recency effects. In addition to this model, Smith and DeCoster's (1998) recurring model describes how observers can use prior knowledge about individuals or groups to infer unobserved or novel characteristics of those individuals. Both models contain a learning algorithm that allows the integration of old and new information and the subsequent storage of the resulting impression in memory. The purpose of this article is to expand on the work of Smith and DeCoster by emphasizing how the recurrent model can explain many additional phenomena in imprint research. In this way we hope to contribute to the theoretical integration of the connectionist approach to impression formation. We also formulate "post-predictions" that have been corroborated by previous research and, importantly, new predictions that have subsequently been corroborated by more recent research.
Many common imprinting processes and insights can be explained in a coherent framework and in many cases better than the algebraic or activation distribution models developed in the past. What are the key features of the connectivity models that achieve this?
First, connectionist models exhibit emergent properties, such as B. Prototype extraction, pattern completion, generalization, constraint satisfaction, and elegant decomposition. (All of this is discussed at length in Rumelhart & McClelland, 1986 and Smith, 1996). It is clear that these features are potentially useful for any description of impression-forming phenomena (see Smith & DeCoster, 1998). Furthermore, connectionist models posit that the development of internal representations and the processing of those representations occur in parallel through simple and highly connected entities, in contrast to traditional models where processing is inherently sequential. Connectionist systems therefore do not require a central executive, eliminating the need for explicit (central) processing of relevant social information assumed in previous theories. Information can therefore in principle be processed implicitly and automatically without the use of
explicit conscious thinking. Of course, this does not preclude people from being aware of the result of these unconscious processes.
Second, most neural networks (e.g. Kashima & Kerekes, 1994; Smith & DeCoster, 1998) are not fixed models but can learn over time, usually using a simple learning algorithm that measures the strength of the connections between the Entities gradually increased changes. form the network. The fact that most traditional social psychological models are not adaptive is an important limitation. Interestingly, the ability to learn gradually brings the connectivity models largely in line with developmental and evolutionary constraints.
Third, connectionist networks possess a level of neurological plausibility not generally present in previous statistical approaches to information integration and storage (e.g., Anderson, 1981; Busemeyer, 1991; Hogarth & Einhorn, 1992). While connectionist models are oversimplified versions of real-world neural circuits and processing, they are widely believed to reveal some novel processing properties shared by real human brains. One of these novel properties is the integration of long-term memory (i.e., connection weights), short-term memory (i.e., internal activation), and external information (i.e., external activation). In short: There is no clear separation between storage and processing as with traditional models. Although biological constraints are not strictly adhered to in connectionist models of social judgment, there is currently much interest in the biological implementation of social inference mechanisms (Adolphs & Damasio, 2001; Allison, Puce & McCarthy, 2000; Cacioppo, Berntson, Sheridan, Sheridan & McClintock, 2000 ; Ito & Cacioppo, 2001; Phelps et al., 2000), which coincides with increasing focus on neurophysiological determinants of social behavior.
This article is structured as follows: First, we describe the proposed connectivity model in detail, including the exact architecture, the general learning algorithm, and the specific details of how the model processes information. In addition, a number of other less known emergent properties of this type of network are discussed. We then present a series of simulations that apply the same network architecture to a number of distinctly different phenomena. These phenomena include precedence and timeliness in impression formation, the asymmetric impact of ability- and morality-related behaviors, memory benefits in inconsistencies, assimilation and contrasts in priming, and the impact of situational constraints on inferences.
Our survey of empirical phenomena in this area is not intended to be exhaustive, but to illustrate how connectionist principles can be used to shed light on the processes underlying impressions.
CONNECTIONS AND PERSONAL CONTRIBUTIONS
to link. While the focus of this article is the use of a particular connectionist model to explain a variety of social cognition phenomena, previous applications of connectionist modeling in social psychology (Kashima & Kerekes, 1994; Kunda & Thagard, 1996; Smith & DeCoster, 1998 ) examined ) are also mentioned. In addition, we compare and contrast our model with a number of different models. Finally, we discuss the limitations of the proposed coupling approach and indicate areas where further theoretical developments are ongoing or required.
A recurring model
In this work we use the same basic network model, namely the recurrent linear autoassociator developed by McClelland and Rumelhart (1985); for an introductory text see McClelland & Rumelhart, 1988, pp. 161ff.; McLeod, Plunkett & Rolls, 1998, 1998, 1998, 1998, 1998). p. 72ff). This model is already known to a number of social psychologists working on person and group impressions (Queller & Smith, 2002; Smith & DeCoster, 1998; Van Rooy et al., 2003) and causal attribution (Read & Montoya, 1999). and attitude formation (Van Overwalle & Siebler, 2002). We chose to use a basic model to highlight the theoretical similarities underlying a variety of processes and personal impressions. We specifically chose this model because it is able to reproduce a wider range of phenomena than other connectionist models, such as feedforward networks (e.g. Van Overwalle, 1998; VanOverwalle & Jordens, 2002), parallel constraint satisfaction models (e.g. Kunda & Thagard, 1996) or ten-level product models (e.g. Kashima & Kerekes, 1994; Kashima et al., 2000).
We believe that one of the strengths of our approach is that, despite the great flexibility of connectionist models, which are sometimes considered theoretically empty because they are too powerful as models of human cognition, we actually make little use of them. As we briefly show, the parameters of our model do not adapt arbitrarily to every situation. Rather, only the learning speed and the supposed order of the learning inputs vary from problem to problem. This makes the network directly testable, since assumptions about the order of the input information and the learning speed (e.g. depth coding) can be tested empirically, at least in principle. This is in stark contrast to parallel constraint satisfaction models, where the key assumptions do not relate to potentially observable inputs, but rather to unobservable internal structures such as link weights, which researchers can set manually but cannot test directly.
The auto-associative network differs from other connection models because of its architecture (the elements of the model) and how it works
The formation is processed and consolidated in memory (the activation updates and the learning algorithm). We will discuss these points in turn.
The architecture of the linear auto-associative network used in this article is shown in Figure 1. Its salient feature is that all nodes are connected to all other nodes. Thus, all nodes (sender and receiver) exchange activations with each other, but not with themselves (i.e. there are no self-connections).
In this type of recurring network, information processing occurs in two steps. During the initial activation phase, each node in the network receives an activation from the environment. Because the nodes are interconnected, this activation spreads across the network in proportion to the weight of the connections to the other nodes. The activation comes from the other node, called the internal input (for each node, it is computed by adding up all the activations arriving at that node). This activation is updated over several cycles over the network. Together with the external input, this internal input determines the final activation pattern of the nodes, which reflects the short-term memory of the network. Typically, activations from external sources are limited to a value between -1 and +1, although activation levels in the network can exceed these limits (e.g., because the external activation is amplified by the internal input). In addition, the starting weights are also limited to a value between -1 and 1 and, like the activations, may grow beyond these limits.
In the linear version of activation propagation in the autoassociator that we use here, the final activation for each cycle is the linear sum of the external and internal
VAN OVERWALLE & LABIOUSE
Figure 1. Generic architecture of an auto-associative recurrent network.
Entry. In nonlinear versions used by other social psychological researchers (Read & Montoya, 1999; Smith & DeCoster, 1998), final activation is determined by a linear combination of external and internal inputs (usually a sigmoid function). However, in our simulations we found that the linear version, with a single internal update cycle, often reproduced the observed data slightly better for reasons we will discuss later. Therefore, we used the linear variant of the autoassociator for all reported simulations.
consolidation in memory
After the first activation phase, the recurring model enters the second learning phase, in which the short-term activations are consolidated into long-term weight changes of the connections. Essentially, these weight changes are caused by the error between the internal activation generated by the network and the external inputs received from external sources. This error is reduced compared to a learning rate that determines how fast the network changes its weight (between 0.10 and 0.35 for all reported simulations). This error-reducing mechanism is known as the delta learning algorithm (McClelland & Rumelhart, 1988).
So when the network overestimates a node's external input, it means that node has received too much internal input from the other nodes through their interconnections. To correct this, the delta algorithm reduces the weight of these connections. Conversely, if the network underestimates the external input, this means that it has received too little internal input and the weights are increased. These weight changes allow the network to better approximate the external input. Therefore, the Delta algorithm strives to match the network's internal predictions to the actual state of the external environment as closely as possible, and stores this information in the connection weights (see Appendix A for more details).
Fundamental new connectionist principles
Before proceeding to the social phenomena of interest, it is important to briefly discuss the basic principles or mechanisms underlying many of our simulations. These principles are new features of the delta learning algorithm and include acquisition, competition, and dissemination. Some of these principles have been documented in previous work on social cohesion (Van Overwalle, 1998; Van Overwalle & VanRooy, 1998, 2001, 2003; Van Rooy et al., 2003). However, since they are essential for understanding our examples, we first describe these principles and discuss their application to imprint formation in more detail later in the simulations.
Detection characteristics and influence of sample size
Acquired property implies implications for sample size that have been documented in many areas of social reason. For example, when people receive more supportive information, they tend to have more extreme impressions of others (Anderson, 1976, 1981) and to make more extreme causal judgments (Baker, Berbier, & Vallée-Tourangeau, 1989; Försterling, 1992; Shanks, 1985, 1987, 1995; Shanks, Lopez, Darby & Dickinson, 1996), make more polarized group decisions (Ebbesen & Bowers, 1974; Fiedler, 1996), support stronger hypotheses (Fiedler, Walther & Nickel, 1999), 199. extreme predictions (Manis , Dovalina, Avis & Cardoze, 1980) and are more consistent with persuasive messages (Eagly & Chaiken, 1993).
One of the most striking features of connectionist models using the Delta algorithm is that learning is modeled as a step-by-step online process of adapting existing knowledge to new information. This property has already been exploited in the earlier associative learning models that preceded connectionism, such as the popular Rescorla-Wagner model (Rescorla & Wagner, 1972) of animal conditioning and human contingency judgments.
The Rescorla-Wagner model (Rescorla & Wagner, 1972) predicts that when a cue (i.e., conditioned stimulus) is followed by an effect (i.e., unconditional stimulus), the organism absorbs that information, resulting in greater cue- Effect association leads and a stronger response when the clue is present. In humans, this also leads to stronger judgments about the causal influence of the stimulus (see Baker et al. 1989; Shanks, 1985, 1987, 1995; Shanks et al., 1996; VanOverwalle & Van Rooy, 2001b). Similarly, the Delta algorithm predicts that the more information obtained about the coexistence of an actor or stimulus and a trait category, the stronger their connection weights. This results in a pattern of increasing weights as more information is processed, i. H. to an effect on the sample size (see image in Figure 2A). Most previous algebraic models of impression formation also predict this gradual increase in judgment strength (Anderson, 1981; Busemeyer, 1991; Hogarth & Einhorn, 1992).
How are online learning and a sample size effect achieved in connectionist models? Because connection weights are initially set to zero (or some arbitrarily low scale value), the effect is that connection weights are relatively small and often imprecise in the early stages of learning, gradually becoming more accurate (stronger or weaker, positive or negative). ) as soon as further information is available. The reason for this incremental learning is that the error in the delta algorithm is only gradually minimized as a function of the learning rate. Even if the covariation between a function and
CONNECTIONS AND PERSONAL CONTRIBUTIONS
When a category is perfect, the learning rate requires that the weights connecting the two increase by only a small fraction. Therefore, multiple iterations of the same information are required for a strong weight corresponding to this covariance to appear.
For example, Figure 2A shows a system with a learning rate of 0.20. Feature A is always accompanied by a separate category (i.e. perfect correlation). Assuming that the link weight associated with attribute A is initialized to 0, a learning rate of 0.20 means that the initial error of underestimating this correlation is gradually corrected by increasing the link weight by 20% of the error. This causes the weight to gradually increase with each try, starting at 0.20 on the first try and eventually reaching a maximum value of +1 after several tracks. Note that parallel constraint satisfaction models (Kunda & Thagard, 1996; Read & Marcus-Newhall, 1993; Shultz & Lepper, 1996) do not have a learning algorithm and therefore cannot make this apparent prediction.
It has been found that given a sufficient number of trials, the delta learning algorithm converges to the same predictions as traditional rule-based algebraic models of causal reasoning (Chapman & Robbins, 1990; Sarle, 1994; Van Overwalle, 1996). More importantly, using the same logic, it can be shown that the delta algorithm also converges to Anderson's (1981) weighted average model (see Appendix B). That is, the Delta algorithm predicts that in the early stages of learning, a person's impressions will become progressively stronger, as if the information were being added (e.g., Betsch, Plessner, Schwieren & Gütig, 2001), but after more Impression information is characterized as a weighted average of the information. Furthermore, it is easy to show that other algebraic models (Busemeyer, 1991; Hogarth & Einhorn, 1992) are mathematically identical to a simplified version of the delta algorithm used in connectionist models, which deals with only one cause at a time (e.g for evidence see Van Overwalle & Van Rooy, 2001b; Wasserman, Kao, Van Hamme, Katagiri & Young, 1996). Since these models can be considered special cases of the more general Delta algorithm, but with lower computational power, we will mostly ignore them for the rest of this article.
Competitive real estate and discount
Another important feature of the Delta algorithm is that it allows competition between connections. This competition favors the more predictive or diagnostic functions. The term competition comes from the aforementioned associative learning literature on animal conditioning and causality judgments (Rescorla & Wagner, 1972; Shanks, 1995) and should not be confused with other uses in the connectionist literature, such as competitive networks (McClelland & Rumelhart, 1988). ). A typical example of competition is the phenomenon of discounting in causal attribution. When a cause is given strong causal weight, observers tend to ignore alternative causes (Hansen & Hall, 1985; Kruglanski, Schwartz, Maides & Hamel, 1978; Rosenfield & Stephan, 1977; Van Overwalle & Van Rooy, 1998, 2001a ; Wells & Ronis, 1982). In impression formation, information about the situational context in which a particular behavior took place often (but not always) leads the actor to reject character traits (Gilbert & Malone, 1995; Trope & Gaunt, 2000).
Competition arises naturally in associative learning models such as the Rescorla-Wagner model (Rescorla & Wagner, 1972), where it is termed 'blocking'. One of the reasons the Rescorla-Wagner model is so popular is that it was one of the first conditioning models to predict this property. As several researchers have noted (Read & Montoya, 1999; Van Overwalle, 1998), the Delta algorithm makes similar predictions. Another well-known connection learning algorithm, the
VAN OVERWALLE & LABIOUSE
Figure 2. Graphic representation of the principles of (A) acquisition (with learning rate 0.20), (B) competition, and (C) diffusion. T = property; M = multiple functions; I = rare properties. Filled nodes are activated by a single try; Empty nodes are not activated. Solid lines indicate strong connection weights; dashed lines indicate average weight; Dashed lines indicate weak weights.
The Hebbian algorithm used in the tensor product model by Kashima and Kerekes (1994) does not have this property and therefore cannot make direct discount predictions.
How does this property work in connection models? Competition is driven by the links that connect multiple determinants (actors, situational conditions, etc.) to a category node (e.g., implicit property; see up arrows in Figure 2B). The key mechanism is that the activation of the pull node is determined by the sum of the activations received from all other determinants. The left panel of Figure 2B shows several possible determinants of the property T1. Due to these multiple activations, this leads to over-activation of attribute T1 and increased (negative) errors in the delta algorithm, blocking or even reducing further growth of connection weights to T1. In contrast, the association of a single determinant with T2 in the right panel of Fig. 2B is free to grow until it reaches an asymptote since no other activations can compete and delay acquisition.
Diffusion property and memory for inconsistent information
Another property of the Delta algorithm is responsible for weakening connections when a single property node is connected to many behavior nodes that are activated only occasionally. This trait was introduced to explain improved recall of inconsistent or consistent behavioral information in impression formation (Hastie & Kumar, 1979). Improved memory for inconsistent behavior has traditionally been explained in terms of a proliferating activation model of memory, in which inconsistent information is processed in depth so that it develops stronger lateral connections to other behavioral nodes (Hastie & Kumar, 1979). Other researchers have argued that improved memory for unique information is due to a fan effect, in which some level of activation is shared between the compounds removing tooth knots. The more connections there are, the less activation each receives (Anderson, 1976). However, the diffusion property is a fundamentally different mechanism. While fanout involves distribution of activation, diffusion involves distribution of weights. It is a new property in the literature on associative learning and connectivity that, to our knowledge, has never been discovered or mentioned before.
How does this diffusion principle explain better recall of inconsistent information? In contrast to the competitive trait, the diffusion effect is controlled by traits → behavioral relationships (see down arrows in Fig. 2C). The basic mechanism is that when learning an actor's behavior, each node reflecting a specific behavior is activated only once along with the property node and remains inactive while other behaviors are activated with the same property node.
This long period of inactivation leads to a weakening of the trait → behavioral relationships. This is not due to a spontaneous loss of connection weights. Rather, the mechanism is that each behavioral node (except the last one) is inactive at some point during learning, having been active during previous learning. This behavior node inactivity is unexpected for the network and therefore leads to a weakening of the properties → behavior connections. This process continues for all behavior nodes (except the last activated one), leading to an overall weakening of trait → behavior connections.
To illustrate, after observing an initial behavior, the link between trait and behavior gains strength (through the acquisition trait). When the second behavior is presented, the first behavior node is inactive while the feature node is still active, and consequently the strength of the connection between the first feature and the behavior is reduced. This reduction continues for all subsequent behaviors (see the left panel of Figure 2C for a schematic representation), resulting in the weakest weighting for the first few connections. Compared to many consistent behaviors implying the T1 trait, the number of inconsistent behaviors implying the opposite T2 trait is by definition fewer. Therefore, there is less inactivation and thus weakening of inconsistent trait-behaviour connections (with T2) than of consistent connections (with T1). This uneven attenuation or propagation therefore leads to better recall of inconsistent information.
Overview of the simulations
We have applied the three emerging connectionist processing principles to a number of classic findings in the social cognition literature. To illustrate, we repeated a well-known experiment that illustrates several phenomena. Table 1 summarizes the subjects of the simulations to be reported and the relevant empirical data we attempted to reproduce, as well as the main underlying processing principle responsible for generating the data in the simulation. Although not all relevant data on impression formation can be covered in one article, we are confident that we have included some of the most relevant phenomena in the current literature.
Essentially the same method was used in the simulations. The specific conditions and experimental procedures of the targeted experiments have been recreated as faithfully as possible, although slight changes have sometimes been made for the sake of simplicity (e.g. fewer experiments than in the actual experiments). For each simulation, the autoassociative network was run 50 times (i.e. 50 participants were simulated) with random or fixed trial order (as in the real experiment), and the results were then averaged over the 50 runs.
CONNECTIONS AND PERSONAL CONTRIBUTIONS
The architecture of the network
The concepts of interest in the simulations such as actors, characteristics, behavior, etc. are each represented by a single node. This is a localistic encoding where each node reflects a "symbolic" concept. In contrast, in distributed coding, as used by Kashima and Kerekes (1994) and Smith and DeCoster (1998), a concept is represented by an activation pattern across a sequence of nodes, neither of which reflects a symbolic concept, but rather a subsymbolic microfunction of it (Thorpe, 1994). We are aware that local coding lacks biological plausibility as it implies that each concept is stored in a single processing unit and, except for explicitly different levels of activation, is always perceived by the network in the same way. The coding scheme was chosen as a simplifying assumption to demonstrate the power of our model. We show at the end of this article that distributed representations give approximately the same results.
Representation of properties. We assume that behaviors or actors are naturally categorized (ie represented by two nodes with a local encoding) by at least one of two opposing property categories. For example, a person's performance can be characterized as "stupid" or "intelligent." Fine-grained categorizations, for example according to different levels of intelligence, are also possible. However,
Observers in experimental settings are sometimes forced to make one-dimensional judgments, such as when the experimental instructions ask participants to make a judgment on a single intelligence or liking scale. When such requirements dominate the impression task—particularly when participants are asked to make repeated one-dimensional judgments—we assume that observers are likely to represent their judgment according to a single integrative conceptual category (i.e., represented by a single node given the local encoding). . This one-dimensional representation is only used in the first two simulations, which will be discussed briefly, but it is important to reproduce some of the interesting phenomena.
How can such a new unified concept be reproduced in human memory? Research in neuropsychology has found that traces of new episodic events or concepts are stored in the hippocampus, and more recently connectionist modelers have begun to model these processes (e.g. O'Reilly & Munakata, 2000, pp. 287- 293; O. Reilly & Rudi, 2001). The basic idea is that new information or concepts consist of a unique combination of existing features and this configuration is temporarily stored in a hippocampal layer by representing each unique event or concept through an internal representation (e.g. constituent features). Since such a detailed network is beyond the scope of this article, we included only one in our simulations
VAN OVERWALLE & LABIOUSE
Table 1. Overview of simulated person representation topics and underlying characteristics
Regarding the real estate search
Personal impression formation1. Online integration More extreme judgments brought more evidence
positive or negative attributes Recording of actor attribute weights
2. Serial Position Weighting The last element in a sequence has the greatest impact on move conclusions (recency). Additionally:
• The novelty weakens after a longer list of items. Competition: Stronger context reduces the impact of inconsistent elements on properties
• Priority when only a deductive conclusion is given. Acquiring a trait→actor connection sends consistent trait activation to the actor and reduces learning
Derived behavior - similar properties3. Asymmetric Signals High power and low morale are more
Diagnosing an actor's traits as low skill and high morale
Acquisition: Skewed distribution of skills and moral behavior
Withdrawal of behavioral information4. Remembering inconsistent behavior is better for inconsistent behavior. Less prevalence of rare characteristics → behavioral relationships
Primer 5. Assimilation and contrast primer with
• a trait leads to the assimilation of that trait. Acquisition: The activation of additional attributes is linked to the actor
• An example leads to a contrast away from the implicit feature
Competition: Instance→Property Link competes with Actor→Property Link
Discount depending on the situation6. Integration more situational
Informative discounting of a train based on situational information,
especially when that information is more important or useful
Competition: Declining actor → trait link with stronger situation → trait link
7. Discounting and sample size Discounting of an actor's attribute when there is more evidence for an alternative actor
The acquisition of an alternative attribute association leads to a competition with the target attribute association
Node representing an integrative one-dimensional representation of a feature. To clarify the most basic learning principles, we have used only this integrating feature node and ignored the two opposing feature categories (i.e. represented by two nodes in a local encoding). Although it is highly likely that these contrasting feature categories continued to play a role in learning because they appear to be the most natural, adding these categories to the observed data did not change the results significantly (in fact, it did little to improve agreement with the improve observed data). Data).
context node. In virtually all simulations involving the assessment of an actor, the target actor was accompanied by a general context or other comparison factors. This serves as a benchmark for assessing an actor's attribute level. This is a crucial process of social cognition. Since there are no objective standards for judging people's behavior and opinions, observers need a different, social standard for their judgments. This idea was perhaps best developed in attribution theory (Kelley, 1967). A wealth of research has shown that attributions to an actor depend on a low level of consensus, i.e. the extent to which the person's behavior differs from that of other comparison persons. When the behavior is similar, we attribute it not to the person but to an external circumstance or context (Kelley, 1967; Van Overwalle, 1997). Thus, other people provide a standard of comparison that is often internalized as norms for different target categories (e.g. gender, social groups, etc.). Comparisons can be made against general standards or specific comparators, depending on work instructions and the availability of specific comparators. In other words, it is necessary to consider context to determine if the actor is responsible for his or her behavior and therefore it is necessary to be able to create character traits. By indicating how common a behavior is in general, we can identify the relevant social norms or standards. How far the actor's behavior deviates from this norm is an indication of the actor's underlying characteristics.
What connection mechanism allows a context to serve as a comparison standard when retrieving properties? The underlying mechanism is the principle of competition. When context develops a strong association with a trait because it is linked to behavior that implies a trait as often (or more often) than with the actor, it tends to compete with the actor-trait connections, resulting in weaker traits . In contrast, if the context develops a weaker connection to the trait because it is associated less frequently with the behavior than with the actor, the context cannot exclude the actor → trait connection, resulting in stronger trait traits.
Contextual competition plays a role in all of our trait evaluation simulations. We used a number of contextual factors. In some experiments, the context was implicit (for example, only characteristic adjectives were described).
So we've used a general context node without detailing what it represents. In these cases, the context can reflect a range of aspects, from task instructions to (unspecified) cues in the actor's behavioral environment (simulations 1 and 2). In other experiments, the context was explicitly manipulated and defined by specific actor examples with detailed meanings or behaviors, which we used in simulations 5, 6, and 7.
Activation and Learning Parameters
All parameters of the autoassociative model affecting the propagation of activation were retained for all simulations (cf. McClelland & Rumelhart, 1988; technical details of our simulations see Appendix A). In contrast, we have not established a common learning rate for all simulated experiments due to the different contexts, measures, and procedures used in them. Instead, for each simulation, we chose a learning rate value that offered the highest correlation to the observed data, after examining all allowed parameter values (see Gluck & Bower, 1988; Nosofsky, Kruschke, & McKinley, 1992).
Variations in learning speed are thought to be due to differences in attention to the task. These differences may be due to modulations in basal arousal and behavioral activation (eg, sleep-wake cycle). Response to novel, affectively charged, motivatingly relevant, or otherwise salient stimuli; or responsiveness to task-specific attentional focus and voluntary control over information exploration, scanning, and encoding. A re-examination of the neurological basis of attention suggests that general arousal is controlled by deeper nuclei and brainstem pathways, while fundamental stimulus features are sensed by the thalamus and related subcortical nuclei (eg, amygdala, basal ganglia). In contrast, task-specific attention and voluntary control are most likely modulated by surveillance centers in the prefrontal neocortex (LaBerge, 1997, 2000; Posner, 1992). We felt that this wide range of attention sources justified varying the learning rate from simulation to simulation.
Furthermore, differences in learning speed across multiple simulations are mainly due to conscious control of one's attentional focus. For example, by instructing participants to memorize details of behavior or by providing them with an additional cognitive task (simulations 4 and 7), they divert their attention from inferences about traits. Some connectionist researchers have begun to model these voluntary control and attention processes at a higher level (eg, O'Reilly & Munakata, 2000, pp. 305–312, 379–410). The basic idea of their approach is that activation plays an important role in attentional maintenance and shifting (through dopamine-based modulation), leading to greater accessibility and effectiveness of the internal representations. If
CONNECTIONS AND PERSONAL CONTRIBUTIONS
Information is actively maintained, is readily available to other parts of the system, and continuously influences the activation of other representations. It is assumed that task instructions have a direct impact on the activation of internal representations.
Since such central executive “subnetworks” are beyond the scope of this article, to simulate diminished attention to the target task, we simply manually set the overall learning rate to a slower rate to reproduce the notion that coding and learning were inhibited and more superficial. For example, when we asked to recall behavior or give participants a secondary task, we hypothesized that they would pay less attention to reasoning, so the rate of development of trait-relevant connections would decrease. While we could have manipulated the activation of the input nodes instead of the learning rate to simulate restricted attention (with similar results), due to the economy in manipulating parameters we only varied the learning rate (for a similar approach see Kinder & Schachten, 2001 ).
In some cases, the context node was assigned a separate learning rate for all context→feature connections, in addition to an overall learning rate. Since, as already mentioned, the context in the experiments underlying the simulations was not always clearly defined, it was unclear what constituted its representation. For example, it may consist of fewer or more relevant features than other representations in the network, which can slow down or increase the overall learning rate. It was also not clear how much attention participants would pay to contextual features compared to other information. Instead of making ad hoc arbitrary assumptions about the content or encoding of context features, for consistency with our overall simulation approach, we estimated a separate learning rate for the context that most closely matched the observed data.
Varying the learning rate as described above does not violate the locality principle of connectionism, which states that each connection weight can be updated using locally available information from connected nodes. This is because the learning rate affects only the overall learning rate of the network and not how much and in which direction to adjust the weight, which according to the delta learning algorithm is uniquely determined by local information. Overall, the selected learning rates were quite robust. In other words, increasing or decreasing this parameter had little significant impact on the simulations. It was only when the initial learning rate was already high (≥ 0.25) that increasing the rate further became problematic as the weights became too large and unstable. In addition, this would give too much impact to the new information and result in previous information being completely disregarded.
At the end of each simulated experimental condition, test experiments were performed to simulate the empirically dependent measurements by inducing (i.e. activating their activation) certain nodes of interest and recording the resulting output activation in other nodes. For example, to test pull inference, the actor node was powered on and the resulting pull node activation (without further external activation) was read. Similar testing procedures for the other dependent variables are explained and justified in detail for each simulation.
Our predictions were verified by comparing the resulting test activations with observed experimental data. Because the resulting activation values and experimental results are difficult to compare quantitatively, we examined only the general activation pattern and visually projected it onto the observed data (i.e., we rescaled the test activations obtained by positive-slope linear regression). In addition, statistical port tests between conditions were important. All tests include between-subject analyzes of variance (ANOVAs) or unpaired t-tests unless otherwise noted. These tests would be impossible in some simulations because a fixed order of testing prevents variability in the results. To avoid this and add more realism to our simulations, we added a random value between -0.1 and +0.1 to the default starting weights in all local encodings.
Most impression-forming processes can be viewed as categorization. That is, the social observer tries to decide which trait category the person belongs to by creating character traits based on characteristic information or based on an actor's behavior. Categorizing diverse information into meaningful trait concepts or categories that contain similar functions, roles, or behaviors promotes cognitive economy and organization, allowing us to go beyond the information provided and plan our behavior and interactions with social actors accordingly.
In more recent approaches, the categorization is usually described using a prototype or an example approach. In the prototype approach, students abstract a central tendency in each category and then classify instances based on their similarity to the category's central prototype (e.g., Rosch, 1978). In contrast, the example approach does not assume such an average or typical prototype, where categorization depends on the similarity between the given object and a sample of memory traces of category instances (Fiedler, 1996; Hintzmann, 1986; Medin & Schaffer, 1978; Nosofsky, 1986; Smith & Zarate, 1992). In this recurring approach, as in most connectionist models, the categorization is by
VAN OVERWALLE & LABIOUSE
formed by prototyping, that is, by developing connections between the person and various prototype traits. The stronger a person's association with a particular trait category, the more likely that person belongs to that category and possesses that trait.
In the following simulations, we illustrate how a recurrent network can model impression formation without resorting to explicit arithmetic, as assumed in algebraic models, using two experiments derived from Anderson's (1981) weighted average model. In these experiments, participants are typically given a set of adjectives or behavioral descriptions about an actor and are asked to convey general characteristics or likable impressions of that person (e.g., Anderson, 1981; Asch, 1946; Kashima & Kerekes, 1994). We focus on representative results from this study that reflect
• how impressions are integrated online and become stronger or weaker after participants have received information about an attribute's positive (high) or negative (low) status, and
• When more weight is given to introductory or concluding information in order to impress.
In this experimental paradigm, ratings are requested along a one-dimensional rating scale from the start of the experiment and at various intervals throughout the experiment, so we assume a single attribute representation (on a single attribute node).
Simulation 1: Online-Integration
Consider first an experiment by Stewart (1965) in which an actor was described by four adjectives implying high quality (e.g., "chatty") followed by four adjectives implying the opposite (or low) quality (e.g. "chatty"). . 'restraint'). ""). Some participants received information about an actor's high attributes in the first half of the experiment and information about low attributes in the second half, while other participants received the reverse low-high order. In the continuous condition simulated here, after each adjective, participants had to rate the actor on a liking scale ranging from very unfavorable to extremely positive. Consistent with the predictions of Anderson's (1981) weighted average model, Stewart (1965) documented that like ratings increased when information with a high trait was provided and that like rating decreased when information with a low trait was provided (see Figure 3). In addition, he also documented a more recent effect; That is, the later information had a slightly greater impact on the final verdict than the earlier information, as shown by the cross on the far right of the figure.
Simulation. Stewart's (1965) experiment was modeled using a network architecture consisting of a
Actor node connected to a pull node and an extra context node that reflects situational constraints (e.g. social and group norms) or other experimental context variables (e.g. instructions). This context node guarantees a smooth acquisition curve consistent with Stewart's data, and without it the acquisition pattern is more rigid.
What we want to show here is how the attribute implying information is used to create an impression of a particular actor by changing the weight bond between the actor and the attribute according to the acquisition principle. Therefore, the simulation assumes that the sympathy implied by the adjective has already been learned and recruited from semantic and social knowledge. This assumption is further elaborated in the next section (Simulation 3). In particular, we assume here that adjectives relating to sympathy are indicated by an activation value of +1 for the pull node, while adjectives relating to improbability have an activation value of −1 (corresponding to Anderson's scale scores) .
Table 2 shows schematically the experiments to simulate the information from Stewart's experiment (1965). If the actor is described with a positive (high) attribute, the attribute node is activated with a value of +1 and the connection weight is increased according to the containment principle of the Delta algorithm. In contrast, when describing the actor by a negative (low) attribute, the attribute node is activated with a value of -1 and the weight is reduced according to the detection principle. After each adjective, the actor node of the network is queried and the resulting activation of the tree node indicates which attribute the actor transmits (see the lower “Test” panel of the table).
CONNECTIONS AND PERSONAL CONTRIBUTIONS
Figure 3. Online integration of trait-related information: observed data from Stewart (1965) and simulation results (overall learning rate = 0.32, for context = 0.08). The human data are from Figure 1 in Effect of Continuous Responding on the Order Effectin Personality Impression Formation, by R H Stewart, 1965, Journal of Personality and Social Psychology, 1, pp. 161-165. Copyright 1965 by the American Psychological Association. Modified with permission.
simulation results. The simulation with the recurrent network was carried out with 50 "participants" (i.e. 50 different simulation runs) and a fixed test sequence. Their results with a learning rate of 0.32 (0.08 for context) are shown in Figure 3. Remember that the simulation data has been scaled by a linear regression to make it directly comparable to the observed data. As can be seen, there is close agreement between the simulation and the empirical data, strongly suggesting that online integration adequately captures impression formation by the acquisition principle in our recurrent connectivity model, similar to Anderson's (1981) weighted average model becomes. A repeated measures ANOVA showed that the differences between the order of trials in both the high-low state, F(7,343) = 3853.21, p<0.001, and the low-high state, F(7 , 343) were significant. = 3344.23, p<0.001, and one-way t-tests confirmed that the differences between all adjacent trials in each condition were significant, p<0.001.
Of particular importance is the crossover at the end of the workout. The difference between the two conditions in the last trials was significant, t(98) = 44.11, p<0.001. This reflects a recency effect where the most recently presented adjectives outweigh the previously presented adjectives. It is interesting to note that increasing the learning rate would produce an even greater recency effect, since it would force new information to have a greater impact than older information. Overall, the results suggest that revising and adjusting person impressions is an online acquisition process, with new information often "overwriting" older information previously stored in connection weights.
Simulation 2: Serial position weights
As another example, consider research where aversive information is presented during a single specific position in a series of trials. By comparing the repellency effect with corroborating information at the same position in the series of experiments (Ref.
(notated as serial position), one can estimate the weight each attribute has at a given position (Anderson, 1979; Anderson & Farkas, 1973; Busemeyer & Myung, 1988; Dreben, Fiske & Hastie, 1979; Kashima & Kerekes, 1994). . Early aversive feature information can be important in crystallizing an impression (primacy effect), while late information can be influential because it sheds new light on previously presented features (recency effect).
Most research suggests that when participants continuously state their character traits after the presentation of each adjective and increase sharply thereafter, subjects' weights are relatively equal in all but the last position. This reflects a recency effect also observed in the previous simulation. However, it is important to note that this recency effect decreases as more information about attributes is provided and processed. That is, aversive information has a stronger recency effect when given little attribute-implicating information than when given many attributes. implies information. It's as if increasing the amount of confirmatory information protects the observer from negative information. In contrast, if features are given only once, after the entire set of information has been presented, priming is more likely (for reviews see Hogarth & Einhorn, 1992; Kashima & Kerekes, 1994).
In a typical experiment by Dreben et al. (1979), participants read about different actors, each described by four behaviors that implied the same high (H) or low (L) end of a trait (e.g., HHHH, HHHL, HHLH, ... , LLL). In the continuous state, participants were asked to rate the actor on an endpoint scale ranging from "most likeable" to "least likeable" after each behavioral description. In the latter case, this assessment was made after all behavioral descriptions of an actor had been submitted. Serial rank was measured by comparing each item list's rating to another list with the same set of high or low items, except for one opposite item (i.e., with behaviors that convey the opposite trait). This inconsistent element was placed in the first, second, third, or last position of the sequence to measure the weight of the element in the first, second, third, or fourth position in the list. For example, if we present behavioral items indicative of the same (high or low) attribute of the same attribute x, y, or z, then the serial rank 1 weight is measured by the mean score difference between all item lists Hxyz and Lxyz. Likewise, the weight of series position 4 is measured by the average difference between all topic lists xyzH and xyzL. The results were as expected (see Figure 4). When they received ratings at the same time, a recency effect occurred, which became weaker towards the bottom of the list, as indicated by the dotted line indicating recency weakening. Conversely, a primate effect was observed after a single final assessment.
VAN OVERWALLE & LABIOUSE
Table 2. Online integration of function implying information (simulation 1)
actor context function
Condition 1: High-Low Presentation Order #4 High Attribute 1 1 +1# 4 Low Attribute 1 1 -1
Condition 2: Low-High Presentation Sequence#4 Low Attribute 1 1 -1#4 High Attribute 1 1 1
Test move of actor 1 0 ?
Note: Schematic representation of the experimental set-up by Stew-art (1965). High = adjective implies property, Low = adjective implies opposite property, # = number of trials. Cell entries indicate remote activation. The simulation was performed separately for each condition.
Simulation. For the experiment by Dreben et al. (1979) we used the same iterative architecture as before. The learning process is shown schematically in Table 3. The table shows only the fourth series position and the other series positions were simulated by changing the order of the inconsistent sample in the point lists. The similarity score was simulated by preparing the actor node and reading the activation of the similarity node after the presentation of each trial. The serial position weight was measured as in the original experiment by calculating the mean difference between the resulting similar activations between item lists that differed in only one attribute.
simulation results. The repetition simulation was carried out with 50 participants and a fixed order of the experiments.
The results with a learning rate of 0.25 (and 0.025 for context) are shown at the top of Figure 4. The current network was clearly able to reproduce the predicted recency effects (indicated by the solid lines). For each liking value, t-tests confirmed that the weights at the last position were significantly higher than at earlier positions, ts(98) = 2.37-12.21, p<0.05. More importantly, there was also a weakening of recency, as a one-way ANOVA showed that the weight of the last row position (indicated by the dotted line) decreased significantly from the beginning (first) to the end (fourth) ratings) were given. of the item list, F(3, 196) = 9.26, p < .001.
As we saw in the previous simulation, recency is a natural consequence of acquisition on a link network, as later information tends to overwrite earlier information stored in the link weights. The more crucial question is how the network achieved the weakening of the recent ones. The context plays a crucial role in this. Since context is always associated with the implicit attribute, the context→attribute weight becomes stronger with more information and therefore competes more strongly with the person→attribute weight. When inconsistent items are specified later in a sequence, they tend to be more disregarded due to the increasing (positive) impact of context. In other words, a strong impression resulting from prior information in the same context makes the observer more resistant to changing their impression in the face of an aversive object. This explanation differs from Anderson's argument based on a distinction between object-specific and abstract aspects of impression formation (Anderson & Farkas, 1973; Dreben et al., 1979), but is similar to the tensor product simulation of Kashima and Kerekes (1994). . ) in emphasizing the role of judgment context.
CONNECTIONS AND PERSONAL CONTRIBUTIONS
Figure 4. Serial position weighting: observed weight curves from Dreben, Fiske, and Hastie (1979; left panels) and simulation (right panels) of the attenuation of the recent continuous response (top; total learning rate = 0.25, for context = 0.025) and precedence the final answers (below; overall learning rate = .25, for context = .00). The human data are from Figure 1 in "The Independence of Evaluative and Item Information: Impression and Recall Order Effects in Behaviour-Based Impression Formation" by E K Dreben, S T Fiske and R Hastie, 1979, Journal of Personality and Social Psychology , 37 , pp. 1758-1768. Copyright 1979 by the American Psychological Association. Modified with permission.
Table 3. Serial Position Weights (Simulation 2)
actor context function
Item List# 3 High or Low 1 1 +1 or -1# 1 High 1 1 +1
Item list with opposite movement# 3 High or Low (same as above) 1 1 +1 or -1# 1 Lowa 1 1 -1
Test move of actor 1 0 ?
Note: Schematic representation of the experimental design by Dreben et al. (1979): High = behavior implies a high extreme of a trait; Low = behavior implies the opposite extreme of a trait; # = number of attempts; cell entries indicate remote activation; The initial weights were set to 0.05 (0.02 for distributed encryption). The simulation was performed for each item list separately.a This trial is presented at position 1, 2, 3, or 4 of the series (here at position 4), and the resulting test activation is averaged over all objective function lists. Subtractions are made across all opposite feature lists, to measure serial position weight.
Let us now look at the precedence effects that typically emerge when impression judgments are made at the end of the process rather than continuously (Anderson, 1979; Hogarth & Einhorn, 1992). Perhaps the easiest way to achieve a base priority effect is to reduce the context node's learning rate to zero, or to ignore the context altogether (i.e., set the activation of the external context to zero). The bottom panel of Figure 4 shows the results of the simulation when the learning rate is set to zero. A one-way ANOVA showed that the differences between the trial orders were significant, F(3.196) = 29.32, p<0.001.
How did topicality disappear by reducing the pace of learning or by externally activating the context? An analysis of the simulation suggests the following explanation. After a few attempts, the person node and the characteristic node develop strong connections to each other in both directions (cf. acquisition principle). When an aversive attempt is presented, the negative (affirmative) activation propagates from the pull node to the person node, and the net activation of the person node decreases. (This does not happen when a context is present, since that context sends a positive activation to the actor node, preventing a decrease in net activation in the person node.) The same logic applies to the property node. The positive external activation from the person node propagates to the trait node, reducing the negativity of trait activation. Due to the reduced net activation of person and characteristic nodes, there is only a slight adaptation of the person → characteristic connection. In other words, in recent studies, the opposing extrinsic activations of the person and the trait tend to cancel each other out, leading to poor learning and parsimony. This reduction is greater on later trials when the person and trait have developed stronger associations. Note that this mutual cancellation is only possible by assuming a single integrating property node and not two independent opposing properties. otherwise this primacy effect would not occur.
A more direct way to simulate precedence is to have a higher learning rate of the actor node at the top of the topic list, which then gradually decreases. This statement shares the view of Anderson's (1981) attention decrement hypothesis that most attention and assimilation of information occurs during the earliest attempts, and information presented later has little effect. Such a method was applied in the precedence tensor product simulation by Kashima and Kerekes (1994). However, since this forces the emergence of a primary effect rather than evoking it through underlying processing mechanisms, the question of what causes attentional loss remains.
In summary, based on our connectivity simulations, we can explain the different effects of continuous and finite judgments by differences in context learning or coding. We anticipate that if observers are made more accountable for their impressions by being asked to judge -
The acquisition and processing of information is becoming more and more careful and takes more account of the contextual cues in the actor's situation or in the experimental setup. This may be less the case if an attribute judgment is only requested once, resulting in more context neglect and thus more context neglect and thus priority over recency.
discussion and further research. Our recurrent model reproduced both recency and priming effects. Although Kashima (Kashima & Kerekes, 1994; also Busemeyer & Myung, 1988) pointed out that recency damping could not be simulated with a feedforward network (which is correct) or even with a current network (Kashima et al., 2000). on the contrary, we have shown here that it can be easily reproduced using a recurrent connectionist model. The tensor product model proposed by Kashima and Kerekes was also able to reproduce these effects, but required additional assumptions, such as a changing context after each judgment, to achieve recency mitigation. In our model this was not necessary since the presence of the context was sufficient.
We propose that encoding contextual cues in the presence of the actor allows later information to have greater impact (recency), while ignoring context is responsible for making an impression quickly (primacy). Furthermore, we propose that the weakening of recency is due to increased discounting of context-inconsistent information. There is some support for our hypothesis that timeliness is determined by greater attention to information (including context). This is evident from research showing that conditions that lead to greater accuracy or motivation lead to more recency rather than precedence. A number of studies by Kruglanski (Freund, Kruglanski, & Shpitzajzen, 1985; Heaton & Kruglanski, 1991; Kruglanski & Freund, 1983) documented that conditions that promote accurate judgments reduce precedence. Similarly, Gannon, Skowronski, and Betz (1994) found that depressed individuals who are more motivated to process information are more recent.
Future research could further support the returning network's predictions. According to our network, giving more context to explaining the actor's behavior should increase recency, not pre-eminence. The more constant the context, the more likely it is that timeliness will weaken, since an unchangeable context core becomes stronger and therefore tends to exclude more inconsistent information.1
VAN OVERWALLE & LABIOUSE
1 Dreben et al. (1979) also reported serial positional effects on memory. Regardless of the response conditions (continuous or finite), they found results mainly for the first two items and more recently for the last item as well. Behavioral information retrieval can be simulated by expanding the network (including the original counterfeatures) with nodes that reflect unique behavioral information (see also Simulation 4). With this approach, the reported primacy effect could be reproduced, but not the recency effect.
Derive properties from corresponding behavior
In the previous simulations, we ignored the question of how information about an actor's behavior or characteristics was used to make a trait judgment. We simply assumed that the implicit characteristic or evaluative meaning implied by these behaviors and functions was transferred directly to the actor. While this simplification may characterize key aspects of person impression processes, two questions remain unanswered. First, all behaviors are equally characteristic of traits; That is, are some behaviors easier to exploit than others when drawing an inference? Second, can this process of inferring associations between behaviors and traits be modeled by a repetitive network? In the next simulation we try to answer these questions.
Simulation 3: Asymmetry in ability and morale conclusions
A notable finding in research and on dispositional inference is that the diagnosis of a particular behavior varies depending on the content of the inference domain. For example, since in the competency area observers normally expect that a high level of achievement can only be achieved by a person with a propensity for high ability, an actor's high level of achievement should evoke the inference of high ability. In contrast, a low level of achievement should evoke more uncertainty in the observer as it can indicate both low and high ability, as even a person with high ability can sometimes fail. When it comes to inferences about morality, this pattern is completely reversed. Since observers expect that a low level of moral behavior can only be achieved by a person with a tendency towards low morality, the actor's immoral action should lead to low moral conclusions. A high level of moral behavior, on the other hand, can indicate both low and high morality, since even immoral individuals tend to behave morally. These expectations of behavioral characteristics lead to an asymmetry in the diagnosis of ability- and moral-related behavior. . High performance is more diagnostic for ability inferences, while low morale is more indicative for morale inferences. Furthermore, extreme behavior is generally considered more characteristic of people with extreme traits, while moderate behavior can be characteristic of both extreme and moderate traits (Lupfer, Weeks, & Dupuis, 2000; Reeder, 1997; Reeder & Fulks, 1980; Reeder & Spores, 1983; Skowronski & Carlston, 1987; Wojciszke, Brycz, & Borkenau, 1993; for reviews see Reeder & Brewer, 1979; Skowronski & Carlston, 1989).
Where do these asymmetric expectations of behavioral traits come from? One of the explanations that has received significant empirical support is Skowronski's cue diagnosis interpretation and
Carlston (1989). Consistent with our view, these authors posit that behavioral traits are used to classify an actor into one or more trait categories. Behavior that strongly favors a characteristic category (e.g. dishonest) over alternative categories (e.g. fair) is said to be more diagnostic. The asymmetry is thought to result from different associations with categories of ability and moral traits. According to Skowronski and Carlston, highly immoral actors can rob banks, but they can also help an old woman cross the street. On the other hand, moral agents may lie about their age, but they will never rob banks. In contrast, an excellent skill high jumper sometimes overcomes 7 feet and sometimes fails. On the other hand, a bad high jumper will probably never reach 7 feet. For example, immoral behavior is more often associated with immorality than moral behavior with morality, while competent behavior is more frequently associated with high competence than incompetent behavior with low competence. .
A possible interpretation of the cue diagnosis concept could be based on this semantic meaning of skills and moral traits. This argument relies on the semantics of high or low ability and morale to infer which behavior is most likely. However, this leaves the question unanswered as to how this semantic knowledge is acquired in the first place. From a developmental perspective, it seems more likely that young children will learn to correct a semantic error, such as overgeneralization (e.g. calling all adult males “dad”) – because they have been explicitly told the correct meaning, but because they are experiencing it himself, under what circumstances children call someone "father". Therefore, a more interesting interpretation is that the asymmetric strength of the behavior-trait association is due to the asymmetric distribution of previous relevant observations in the morality and aptitude domains. These observations can be direct or indirect (e.g. when other people talk about their own experiences or observations) and can only later be included in the meaning of skills and moral qualities.
The aim of the following simulation is to show that a skewed distribution during prior learning can lead to a differential diagnosis, as shown in a study by Skowronski and Carlston (1987, Experiment 1). In this study, participants were given descriptions of positive and negative behaviors that reflect five different levels of intelligence and morality. Examples of extreme moral behavior included robbing a store (low) and returning a lost wallet (high); Examples of highly competent behavior were passing most exams (low) and teaching at a university (high). For each behavior, participants were asked to rate the extent to which "a person with [trait] would ever [behave]"? (Skowronski & Carlston, 1987, p. 691), with possible characteristics such as dishonest, fair, intelligent and stupid.
CONNECTIONS AND PERSONAL CONTRIBUTIONS
Simulation. A possible (simplified) learning history that might reflect participants' acquisition of prior relevant knowledge about behavioral traits is presented in Table 4. The figure shows five levels of competence and moral behavior traits along with the implied high or low trait. Extreme behavior is characterized by a configuration of extreme to neutral traits, moderate behavior by a configuration of moderate and neutral traits, and neutral behavior by a configuration of only neutral traits. For example, highly competent behavior includes not only neutral steps, like writing your research paper, or moderate steps, like accepting your article, but also extreme steps, like publishing the article in top journals (see first row in the table). Conversely, moderately and neutrally competent behaviors only affect lower-level functions (see rows 2 and 3).
To be realistic, since extreme behavior is less likely than moderate behavior, we first equated the frequency of extreme trait-consistent behavior with neutral behavior. To reflect the general finding that observers generally have positive expectations of people, we also hypothesized that the behaviors of gifted and gifted individuals
Rality traits were more common (ie, 6, 7, 6, 5, 4, from high to low) than those of the low traits (ie, 4, 5, 4, 3,2, from low to high). The realism of this learning history is reflected in the fact that the overall distribution shows significantly more neutral than moderate or extreme characteristics in people's behavior and significantly more moderate than extreme characteristics.
The asymmetric distribution in learning is reflected in the fact that a high-performing actor usually performs well to very well, but sometimes moderately or even very poorly. For example, a top researcher usually publishes in top-ranked journals, but sometimes also in lower-ranked journals. An underperforming actor, on the other hand, will usually perform poorly, or sometimes mediocrely, but never perform particularly well. In the table, non-diagnostic behavior was introduced by setting the frequencies significantly lower: the frequencies were set to zero for extreme behavior and to 3 for moderate behavior. In the moral realm, this asymmetry is reversed. That being said, the simulation was quite robust to changes in the non-zero frequencies in Table 4, such as B. fitting non-zero frequencies to other smooth distributions or even equating all.
VAN OVERWALLE & LABIOUSE
Table 4. Asymmetry in conclusions about skills and moral traits (simulation 3)
Characteristics of behavioral traits
A++ A+ A0 A– A– – M++ M+ M0 M– M– – A+ A– M+ M–
Feature High performance# 6 1 1 1 0 0 0 0 0 0 0 1 0 0 0# 7 0 1 1 0 0 0 0 0 0 0 1 0 0 0# 6 0 0 1 0 0 0 0 0 0 0 1 0 0 # 5 0 0 1 1 0 0 0 0 0 0 1 0 0 0 # 4 0 0 1 1 1 0 0 0 0 0 1 0 0 0
Energy saving function# 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0# 3 0 1 1 0 0 0 0 0 0 0 0 1 0 0# 4 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 # 5 0 0 1 1 0 0 0 0 0 0 0 1 0 0 # 4 0 0 1 1 1 0 0 0 0 0 0 1 0 0
High moral quality# 6 0 0 0 0 0 1 1 1 0 0 0 0 1 0# 7 0 0 0 0 0 0 1 1 0 0 0 0 1 0# 6 0 0 0 0 0 0 0 1 0 0 0 0 1 0# 3 0 0 0 0 0 0 0 1 1 0 0 0 1 0# 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0
Minor moral quality# 2 0 0 0 0 0 1 1 1 0 0 0 0 0 1# 3 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1# 4 0 0 0 0 0 0 0 1 0 0 0 0 1# 5 0 0 0 0 0 0 0 1 1 0 0 0 0 1# 4 0 0 0 0 0 0 0 1 1 1 0 0 0 1
Test feature A+ 0 0 ? ? ? 0 0 0 0 0 1 0 0 0Line A–? ? ? 0 0 0 0 0 0 0 0 1 0 0Trek M+ 0 0 0 0 0 0 0 ? ? ? 0 0 1 0 Line M– 0 0 0 0 0 ? ? ? 0 0 0 0 0 1
Note: - Schematic representation of the experimental setup by Skowronski and Carlston (1987, Experiment 1), A = ability, M = morale, ++ = extremely positive, + = positive, 0 = neutral, - = negative, - - = extremely negative. # = number of attempts. Cell entries indicate remote activation. All-learning trials were presented in an order that was randomized for each run.
Trait-inconsistent behavior was considered by Skowronski and Carlston (1987) to be the most diagnostic and therefore the most important test of their cue diagnostics hypothesis. As an example of behavior not consistent with the traits, participants were asked if an honest person would ever engage in immoral behavior. As shown at the bottom of Table 4, these judgments were simulated by requesting motion and then reading the output trigger of the inconsistent behavior. The reverse test direction, which would theoretically make more sense, would be to ask about the trait implying the behavior (but this was not used in the experiment). This would be simulated by promoting the behavior and testing the property's output activation, and this technique worked just as well in the simulation.
simulation results. The learning stories and test prompts shown in Table 4 were conducted for 50 participants, each with a different random order of trials. Figure 5 shows the results with a learning rate of .10. As can be seen, the simulation results agree unequivocally with the empirical data from Skowronski and Carlston (1987) and the predicted interaction is significant, F(1,196) = 79.98, p<0.001. Inconsistently low performance is more likely at high performance than high performance at low performance, t(98) = 8.42, p<0.001. Conversely, inconsistently high moral behavior in low morale is more likely than low moral behavior in high morale, t(98) = 4.18, p<0.001.
The underlying connectionist principle that leads to these results is poor behavioral coverage with slow diagnosis. Keep in mind that tests showed inconsistent behavior. Because performance-oriented behavior never occurred at low performance, low
Performance is not a good indicator of (inconsistently) high performance. In contrast, high performance is a relatively good predictor of (inconsistently) low performance, since high performance exhibits both high and low performance behaviors. The justification applies analogously to moral behavior and to the inverse relationships between behavior and characteristics. In short, the network doesn't learn anything it never learned.
Extensions and further research. The same learning trajectory also reproduced Skowronski and Carlston's (1987) simple prediction and finding that trait scores are higher for moderately inconsistent behavior than for extremely inconsistent behavior. This was tested in a simulation by comparing the resulting output activation with extremely positive or negative behavior (as represented by "?" at the bottom of Table 4) with the resulting output activation with moderate behavior (not shown).
Regarding trait-consistent behavior, Skowronski and Carlston (1987) reported no diagnostic implications, although previous work suggested that typical behavior is usually highly diagnostic of the trait implied (e.g., Anderson, 1981; Cantor & Mischel, 1977 ). Consistent with these latest findings, our model predicts diagnostic implications for trait-consistent behavior in the same direction as trait-inconsistent behavior. It is therefore predicted that high performance is more indicative of high performance than low performance of low performance. and low moral behavior is more indicative of low morals than high moral behavior with high morals.
Another question is how previously gained insights into the different levels and diagnoses of ability and moral behavior (as simulated before) can be used to draw inferences about a particular actor. We assume that this earlier learning about relationships between behavior and traits is stored in semantic memory and that new behavioral information is automatically transferred to the implicit trait. There is ample evidence that traits are automatically and immediately inferred when reading behavioral information (for review, see Uleman, 1999). This enabled property is then linked to the actor, creating a link between the actor and the implicit property. Skowronski and Carlston (1987) investigated this process in another experiment. Participants read information about actors on five different levels of competency or moral behavior and then rated how intelligent or moral each actor was. As expected, the results documented a linear decline in conclusions going from extremely high to extremely low levels of behavior. Our iterative model can easily reproduce this linear relationship using the same learning curve from Table 4, e.g. B. passed with an actor node.
In the following simulations, we do not explicitly implement the previously learned behavior-trait association, but rather assume – as before – that im-
CONNECTIONS AND PERSONAL CONTRIBUTIONS
Figure 5. Asymmetric signs of ability and morale: observed data from Skowronski and Carlston (1987, Experiment 1) and simulation results (learning rate = 0.10). The human data are from Figure 1 in "Social Judgment and Social Memory: The Role of Cue Diagnosticity in Negativity, Positivity, and Extremity Biases", by J.J. Skowronski & DE Carlston, 1987, Journal of Personality and Social Psychology, 52, p. 689 - 699. Copyright 1987 by the American Psychological Association. Modified with permission.
The repeated feature is automatically activated from semantic memory. This allows us to focus on other interesting phenomena of person impression.
memory for behavioral information
When we learn about others from our own observations, we infer characteristics from the behaviors associated with them. In addition to how we draw these conclusions, as discussed in the previous section, it is also important to understand how these behavioral observations are stored in memory. An intriguing finding is that inconsistent or unexpected behavioral information about an actor is often remembered better than information consistent with the expectation of a dominant trait (for review, see Stangor & McMillan, 1992). Therefore, we remember a hooligan helping an elderly woman across the street better than a nurse performing the same act.
Hastie (1980; Hastie & Kumar, 1979) argued that the inconsistent information requires additional cognitive effort to explain and understand the inconsistency and is therefore explained in more detail. This leads to additional links between the inconsistent information and other storage locations and thus better memory. Hastie supported this interpretation with research showing that inconsistent information leads to more causal elaborations of behavioral statements. However, these sentence elaborations were explicitly requested of the participants after the first phase of impression formation was completed. Therefore, it is not clear whether they were spontaneously generated during initial encoding or created after the query (cf. Nisbett & Wilson, 1977). Furthermore, recent research has challenged the assumption that inconsistent behavior is more closely related to other behavioral information about the actor, since support for this assumption relied on erroneous measures of associative strength (Skowronski & Gannon, 2000; Skowronski & Welbourne, 1997). . From this it was concluded that “associative linkages may not represent the only mechanism, and may not even represent the primary mechanism, for memory incongruity effects” (Skowronski & Gannon, 2000, p. 17).
Simulation 4: Higher recall for inconsistent information
Can our connectionist principles account for the improved memory of inconsistent information without resorting to explicit elaboration processes or associations between behaviors? Yes, and to illustrate this we simulate a well-known experiment by Hamilton et al. (1980, experiment 3). Participants read information about various fictional actors. For each actor, they read a list of 10 consistent and 1 inconsistent behaviors.
verbal descriptions about that actor, after which they had to memorize as many behavioral phrases as possible. The consistent descriptions included everyday behaviors (eg, reading the newspaper, cleaning the house), while the inconsistent descriptions also included violent behaviors (eg, losing their temper and hitting a neighbor, insulting his secretary without provocation). Half of the participants were instructed to form an impression of the actor, the other half to memorize the behavioral information. After a distraction task, participants were asked to "list as many behavioral descriptions as they could remember" (p. 1053). The recall data (see Figure 6) documented that subimpression prompts caused participants to recall more inconsistent items, while this difference disappeared for recall prompts.
Simulation. To understand the improved storage for inconsistent behavioral information, consider a network architecture with one actor node connected to two opposite nodes. One trait node reflects a neutral trait as conveyed by the descriptions of everyday (nonviolent) behavior, while the other trait reflects a violent trait implied by the inconsistent violent behavior. In addition, the behavior descriptors are represented by a separate node that shows the specific behavior example. In summary, each description is represented by two nodes, reflecting the categorical trait implied by the behavior as well as the individual instance of the behavior. Table 5 contains a schematic description of the experiment by Hamilton et al. (1980) with 10 consistent behaviors and 1 inconsistent behavior.
VAN OVERWALLE & LABIOUSE
Figure 6. Recall of inconsistent behavioral information after impression formation and memory instructions: observed data from Hamilton, Katz, and Leirer (1980, Experiment 3) and simulation results (learning rate on impression instructions = 0.27, on memory instructions = 0.027). The human data are from Table 2 of "Cognitive Representation of Personality Impressions: Organizational Processes in First Impression Formation" by D.L. Hamilton, L.B. Katz, & V.O. Leirer, 1980, Journal of Personality and Social Psychology, 39, pp. 1050-1063. Copyright 1980 by the American Psychological Association.
As can be seen in the table, each behavior was triggered along with its associated property and actor node. However, as predicted by the diffusion principle, many of the consistent behaviors are not activated when the consistent trait is present, resulting in negative learning error for those behaviors and weaker traits → behavioral contexts. The more behaviors that confirm the consistent trait, the less each behavior is indicative of that trait. Because there are more consistent behaviors than inconsistent behaviors, diffusion is particularly strong in consistent behaviors. As a result, relationships between traits and behaviors are weaker for consistent behavior than for inconsistent behavior. It is important to note that the learning error leading to a greater spread of consistent behavior affects not only the traits → behavioral contexts, but also the actor → behavioral contexts, since the actor is always active together with the traits.
In contrast, observers are less motivated to form a consistent impression of the actor when they recall the condition. As mentioned earlier, we assumed that this would result in a much flatter encoding of the actor and attribute information, which was simulated by setting the learning rate to 10% of the original value. As a result, any connection weights between the actor or trait and the behavior would drop sharply.
To simulate behavioral episode recall, we activated the actor node and read the resulting activation at each behavioral episode (see bottom panel of table). This priming procedure assumes that the actor is the most readily available in memory and serves as a cue for recalling the behavior. The resulting activation reflects the likelihood that one of the behavioral instances will be remembered. Because research shows that
Participants may also use the actor's characteristics as a guide (eg, Hamilton, Driscoll & Worth, 1989). These simulations led to very similar results.
simulation results. Figure 6 shows the results of the repetitive simulation with 50 participants, each assigned a different randomized trial order. The learning rate under embossing instructions was 0.27 and, as previously mentioned, was fixed at 10% of this value (or 0.027) under recall conditions to indicate flatter encoding. The expected interaction was significant, F(1,196)= 5.72, p<0.05. As can be seen, the simulation reproduced the fundamental finding that inconsistent information was better retrieved than consistent information when printing the statement, t(98) = 2.33, p<0.05. However, under recall instructions, this difference disappeared, t < 1. The diffusion principle suggests that higher recall of inconsistent behavioral information is mainly due to relatively stronger trait → behavioral and actor → behavioral connections of unique or rare behavioral information. Therefore, this connectionist account emphasizes the direct connections of a particular person or trait to behavioral examples. This report is consistent with Skowronski's argument (Skowronski & Gannon, 2000; Skowronski & Welbourne, 1997) that associations between behaviors are not responsible for better recall of inconsistent behavior, contrary to previous suggestions by Hastie (1980) and Srull ( 1981).
Extensions and further research. This network model creates a series of additional "post-predictions" (i.e., a "prediction" of something that is already happening
CONNECTIONS AND PERSONAL CONTRIBUTIONS
Table 5. Storage for inconsistent information (simulation 4)
Actors Neutral Violent Consistent Inconsistent
Itemlist# 1 Consistent 1 1 0 1 0 .. 0 0 0# 1 Consistent 1 1 0 0 1 .. 0 0 0…# 1 Consistent 1 1 0 0 0 .. 1 0 0# 1 Consistent 1 0 1 0 . 0 1 0# 1 Inconsistent 1 0 1 0 0 .. 0 0 1
Consistent 1 0 0 ? ? .. ? ? 0Onregularly 1 0 0 0 0 .. 0 0 ?
Unbiased (and unbiased) recognition
Consequent? 0 (?) 0 1b 1b .. 1b 1b 0Inconsistent ? 0 (?) 0 0 0 .. 0 0 1
Note: Schematic representation of the experimental setup by Hamilton, Katz and Leirer (1980, Experiment 3). There were 10 uniform items in total. Although the inconsistent point was given at specific points in the sequence, here (as in most similar studies) a random sequence is simulated. cell entries indicate remote activation; # = number of attempts. All learning attempts were presented in an ordered order for each run. When memorizing instructions, the learning rate has been reduced to 10%. Incidentally, the coding for recognition is aimed at consistent features. bOnly one example node was activated at a time (with value +1) and the resulting output activation of all these nodes was averaged.
independently known). The model predicts better recall of items at the end of a list than at the beginning of a list because variance affects earlier information more than later information. This has been confirmed by research (Srull, Lichtenstein & Rothbart, 1985, experiments 5 and 6). In addition, the model predicts a smaller recall benefit as the number of inconsistent items increases, as this leads to more variance in the inconsistent items. This prediction was also confirmed (Hastie & Kumar, 1979, Experiment 3; Srull, 1981, Experiments 1-3; Srull et al., 1985, Experiment 3). However, research has shown that even when the number of consistent and inconsistent items is equal and the inconsistency is manipulated by providing extended expectations to the actor, there is still a recall advantage (Hastie & Kumar, 1979, Experiment 3). The model also predicts this outcome. when an advanced learning phase is inserted in which the actor is first associated with the consistent attribute.
Support for Hastie's (1980) alternative proposal for tedious elaboration generation came from studies that found decreased recall of inconsistent behavior when mental resources were limited by shortening reaction time, making the task more complex, or adding distracting tasks (Bargh & Thein, 1985; Hamilton). et al., 1989; Macrae, Hewstone & Griffiths, 1993; Stangor & Duan, 1991). However, these results can be easily simulated with our connectionist network by simply assuming that the load reduces the encoding of behavioral episodes or even all information (e.g. reducing the learning rate to 10% of its original value). This suggests that poorer encoding of conceptual information, rather than less reduction and elaboration of inconsistencies, might be responsible for reduced recall of inconsistent information (for a similar view, see Pandelaere & Hoorens, 2002). Again, it does not seem necessary to postulate explicit elaborations to account for higher recall of inconsistent behavior.
Other states where less cognitive effort is expended on the print job and which typically do not show improved memory for inconsistent memories can be explained in a similar way. For example, when an impression or stereotype is formed by a group of individuals rather than a single individual, there appears to be an improved memory for stereotype-consistent behavior (for a review, see Fyock & Stangor, 1994). This can be simulated by assuming a decreasing learning rate based on the idea that observers do not expect the same level of assessment consistency across a group of people and are therefore less willing to invest cognitive effort to encode a coherent overall impression. This leads to a loss of the enhanced memory of inconsistent behavior, and to the extent that the dominant stereotype interferes with recall when in doubt, a memory effect of stereotype consistency is observed.
This network can also simulate detection measures when behaviors and behaviors are presented to the participants
asked to match each of them to the correct actor. Therefore, recognition rather than memory goals are simulated by testing the opposite behavior → action direction. Normally, recognition focuses on consistent information, presumably because consistent features guide recognition when the observer relies on guesswork rather than real memory traces (for review, see Stangor & McMillan, 1992). This can be replicated by a recognition test geared towards easier recognition of behavior consistent with the consistent trait (see "?" for the consistent neutral trait at the bottom of Table 5). However, when this bias is removed in the simulation (by setting activation to zero for all consistent and inconsistent features), inconsistent behavior is again better recognized than consistent behavior, consistent with research showing that improved recognition sensitivity measurements result in better memory for reveal inconsistent behavior (see Stangor). & McMillan, 1992). It is important to note that the improved recognition of inconsistent behavior, as opposed to memory, is based on the competition between trait→actor and behavior→actor connections. When consistent behaviors are presented, the trait they reflect becomes more strongly associated with the actor. This strong property → actor-connection competes with the weaker behavior → actor-connections, so the latter connections are excluded (see also Van Rooy et al., 2003).
Overall, the proposed model appears to be broadly consistent with a relatively large range of research evidence. This suggests that the dispersion principle presents an interesting alternative hypothesis explaining the increased retrieval of inconsistent information. Furthermore, competition in the use of recognition measures can play an additional role in improved recall of inconsistent information.
assimilation and contrast
An important characteristic of recurring models is their ability to generalize. A trained network, exposed to an incomplete information pattern, will fill in the missing information based on the previously learned complete pattern. This property was convincingly demonstrated by Smith and DeCoster (1998, Simulations 1 and 2) who used a recurrent network very similar to this one. This process of generalization can be viewed as a form of assimilation, as previous experiences affect the way we perceive and interpret new information that is similar or closely related to them. For example, when we see an image of Hitler, we can immediately supplement that image with activated memories of his wars of aggression, the mass extermination of Jews, and so on. There is sufficient evidence that available knowledge such as character traits, stereotypes, moods, emotions and attitudes are likely to lead to generalization of unobserved traits.
VAN OVERWALLE & LABIOUSE
Perhaps a more intriguing and unique property of current networks is the creation of emerging attributes by combining parts of existing attributes (see Smith & DeCoster, 1998, Simulation 3). Traditional categorization theories assume that people use a single schema, stereotype, or knowledge structure to make inferences about a target person or group. Although multiple schedules are relevant, each is triggered and applied separately. But humans can combine many sources of knowledge to construct emerging features to describe subtypes or subgroups of humans. For example, a militant feminist who is also a banker might be categorized as a feminist banker with specific idiosyncrasies that do not belong to these traditions. - Associated with either militant feminist or bank teller representation (Asch & Zukier, 1984; Smith & DeCoster, 1998). Previous connectionist models, such as Constraint Satisfaction Models (Kunda & Thagard, 1996), were unable to model this process.
Simulation 5: Assimilation with traits, comparison with examples
The plethora of assimilation effects found in social cognition research may give rise to the idea that filling in unobserved traits is the default or most natural process. Thus, when we are primed "violent," we rate an unspecified or ambiguous target as more hostile, and when primed "friendly," we rate the same target as less hostile. However, under certain circumstances, the opposite effect can occur. Sometimes groomed features can lead to contrast rather than assimilation.
For example, when humans are primed with the example "Gan-dhi" they may classify a target as relatively more hostile, while when primed with "Hitler" they may classify the same target as relatively less hostile. In these circumstances, the examples of Gandhi and Hitler serve as an anchor by which to judge the goal, resulting in contrast effects. In short, contextually (or chronically) prepared information can serve not only as an interpretive framework, leading to assimilation in the formation of impressions, but also as a standard of comparison, leading to contrasts.
What creates assimilation or contrast? According to Stapel, Koomen and van der Pligt (1997), characteristic concepts are more suitable for the interpretation of an ambiguous description of a person (assimilation), since character traits only have a conceptual meaning. On the other hand, examples – if they are sufficiently extreme – are used as a standard of comparison (contrast) since both the example and the target person are people who can be compared with one another. An experiment by Stapel et al. (Experiment 3) confirmed this proposal. Participants were asked to make an impression of an ambiguously friendly or hostile actor. Before being subjected to the description of the target, they were groomed with characteristics (e.g. violent Ornice) or with names of extreme examples (e.g. Hitler or...).
Gandhi). Finally, they indicated their impression of the target by rating five trait dimensions that implied either high or low levels of hostility. A composite scale of these traits showed assimilation in the trait preparation condition but contrast in the sample preparation condition (see Figure 7).
Simulation. A returning network can simulate this combination of assimilation and contrast. As shown in Table 6, the network first builds background knowledge on extreme examples such as Gandhi and Hitler by associating them with friendly and hostile characteristics. The essential idea of the simulation is that during priming, the primed stimulus and the target description are temporarily activated together. This is illustrated by programming two learning attempts. The first attempt represents the priming condition and the second attempt the description of the ambiguous goal, leaving the activation of the previous priming attempt as the initial activation in addition to the external activation of the actor (see Table 6). We tested the target impression by asking the target node and reading the activation of the friendly node and the (reverse) activation of the enemy node.
simulation results. Figure 7 shows the results for 50 participants tested in different randomized sequences with a learning rate of 0.15. As can be seen, the simulation reproduced the empirical results reported in the study by Stapel et al. (1997) have been reported. The predicted interaction was significant, F(1,196) = 797.33, p<0.001. The main feature was assimilated and the ambiguous actor washiher was rated after priming with a positive instead of a negative feature, t(98) = 22.59, p < 0.001. In contrast, there was a difference from the excellent example that the review presented
CONNECTIONS AND PERSONAL CONTRIBUTIONS
Figure 7. Assimilation and contrast effects after priming with a trait or subject: observed data from Stapel et al. (1997, Experiment 3) and simulation results (learning rate = .15). The human data are from Table 3 of "Categories of Category Accessibility: The Impact of Trait Concept Versus ExemplarPriming on Person Judgments" by D.A. Stapel, W. Koomen and J. van der Pligt, 1997, Journal of Experimental Social Psychology, 33, pp. 47-76. Copyright 1997 by Academic Press.
lower after priming with a positive versus negative prime, t(98) = 17.42, p<0.001.
How was this result achieved? When a trait concept is prepared, this activation spills over into the actor's presentation, leading to stronger actor-trait relationships through the acquisition principle. Since the assessment of the actor's trait involves examining these relationships between actor and trait, it leads to the usual assimilation of the impression of the trait. Conversely, when a copy such as Hitler is groomed, there is competition between that groomed copy and the target copy in their association with the hostile trait. A competition thus arises between the (stronger) Hitler characteristic connection and the target characteristic connection, which leads to a devaluation of the target characteristic connection or to a contrast effect.
Extensions and further research. This network provides an interesting prediction regarding the effects of the extremities of specimens and traits. According to the competition principle, extreme examples as a benchmark should ensure more overestimations in the network and thus a stronger contrast. Similarly, as expected from the generalization property, extreme property primes should result in greater assimilation. This prediction was supported by a recent study by Moskowitz and Skurnik (1999). In two experiments they found that moderate examples (e.g. Kissinger) lead to less contrast than extreme examples (e.g. Hitler) and that moderate primes lead to less assimilation than extreme primes. This was produced in our simulations by replacing our friendly Gandhi copy (von Stapel et al., 1997) with a moderately hostile Kissinger copy (as von Moskowitz &
Skurnik, 1999) associated with a moderately hostile trait (i.e., with a .10 activation) to obtain a moderate specimen, and primary hostile traits at an activation of only .10 to simulate a moderate trait.
Interestingly, this latter simulation was also able to reproduce the additional finding of Moskowitz and Skurnik (1999) that cognitive interference (i.e. increased task load or interruption of the current task) minimized the effects of trait assimilation, but the effects of example contrasts remained relatively small. untouched. This was achieved by simulating reduced resources during priming with a reduced (10% of the original) learning rate. This eliminated assimilation effects, but retained the contrast effects found by Moskowitz and Skurnik (Experiments 3 and 4).
Discount depending on the situation
Most of the simulations we have discussed are based on experimental paradigms, in which trait-relevant information about an actor is provided in the form of traits or brief behavioral descriptions, and participants must assume that this information is relevant to that person. However, as mentioned earlier, social observers are much more sophisticated and often know whether to use or ignore such information. When there are situational constraints that may have provoked the actor's behavior, or when there are many others behaving similarly, observers tend to exclude the behavioral information and are less likely to make an appropriate inference about the trait ( Gilbert & Malone, 1995; Gilbert, Pelham & Krull, 1988; Trope, 1986, Trope & Gaunt, 2000).
VAN OVERWALLE & LABIOUSE
Table 6. Assimilation and contrast (simulation 5)
Meet Gandhi Hitler Friendly Hostile
Learning history so far# 10 Gandhi 0 1 0 1 0# 10 Hitler 0 0 1 0 1
Condition 1: Prepare Gandhi model# 1 Gandhi 0 1 0 0 0# 1 Goal description 1 =a 0 0 0
Condition 2: prepare Hitler copy# 1 Hitler 0 0 1 0 0# 1 target description 1 0 =a 0 0
Condition 3: Priming Friendly Attribute# 1 Friendly 0 0 0 1 0# 1 Target Description 1 0 0 =a 0
Condition 4: Basic Enemy Skill# 1 Enemy 0 0 0 0 1# 1 Target Description 1 0 0 0 =a
Test 1 0 0 ? –?
Note: Schematic representation of previous knowledge acquisition and experimental design by Stapel et al. (1997, Experiment 3), cell inputs indicate external activation; # = number of attempts. All trials during the learning curve were presented in an order that was randomized for each run. The simulation was performed separately for each condition. An activation from the previous trial remains as a starting activation for the current trial.
The effect of this discounting process should be most visible when situational constraints are explicitly manipulated in the experiment. In the following simulations, we discuss such experimental paradigms in which complex behavioral scenarios about an actor are given including information about the situation, such that the observer has to find out whether the actor was the cause of the behavior before inferring a corresponding trait. end can be made. Contextual or situational discounting relates to the competitive trait we discussed earlier. This principle has already been demonstrated for purely causal attributions using a connectionist network with the delta learning algorithm (Read & Montoya, 1999; Van Overwalle, 1998), but not yet for feature inference. The goal of the next section is to extend this approach to feature derivation as well, to check whether the underlying principles of causal attribution can be generalized to properties, and to further use context nodes as used in the previous simulations to motivate.
Simulation 6: situation correction or integration?
One of the current debates in the trait attribution literature concerns the process of excluding inferences from behaviorally corresponding traits in the face of situational imperatives. So-called correction theories assume that observers first automatically assign the behavior to the appropriate trait and then discount the conclusion based on situational information in a separate source-dependent phase (Gilbert & Malone, 1995; Gilbert et al. 1988). In contrast, integration theories posit that situational information is used as an integral part of behavioral inferences, with weighting the contribution of personal and environmental factors "requiring an iterative or even simultaneous evaluation of the various hypotheses before arriving at a conclusion." . (Trope & Gaunt, 2000, p. 353, see also Trope, 1986). It is clear that the connectionist perspective is more consistent with the latter perspective, since network models typically allow many processes to run in parallel without the need for separate and sequential processes are required.
To show that discounting in inference can be partially explained by parallel processing in a connectionist network, we focus on the work of Trope and Gaunt (2000). Trope and Gaunt reiterated the well-known finding that situational information is often underused to preclude inference, particularly under conditions of cognitive load. This finding has often been taken by correctional theorists as evidence that the strenuous correctional phase was interrupted by the manipulation of cognitive load (Gilbert & Malone, 1995). However, Trope and Gaunt argued that situational information may be underutilized because it is less salient or useful to the actor, and that this, rather than a painstaking correction, might explain why
Cognitive load often interferes with discounting. To support their view, they made situational information more cognitively salient, active, or appropriate and found that situational information was used in these circumstances to preclude inference even under cognitive load (Trope & Gaunt, 2000).
In one of their experiments, Trope and Gaunt (2000, Experiment 3) described a teaching assistant using strict criteria for marking an exam. Statements of situational requirements varied depending on the circumstances. Legislative motions were filed in one state; In a general question, participants were told that there was a university-wide requirement to use strict criteria when grading exams; and in a specific question condition, participants were told that the examining professor had given specific instructions to the assistant to apply strict criteria. Finally, participants were asked to rate how strict the assistant was on a 13-point scale from 1 (not a strict person at all) to 13 (a very strict person). In the cognitive load state, participants were asked to memorize an eight-digit number during the task. The results showed that no discount was given under any loading conditions. This means that the teaching assistant is less judged in terms of both general and specific needs. More importantly, consistent with their integration prediction, Trope and Gaunt also documented that discounting was also applied under cognitive load when the question was specific and directly applicable, but not when it was general and less applicable (see also Figure 8 ). This shows that discounting could not have been used in another strenuous corrective phase, since the cognitive load in this phase should have prevented any discounting. In the following simulation, we show that an iterative approximation can explain these results by replicating Trope and Gaunt's experiment.
CONNECTIONS AND PERSONAL CONTRIBUTIONS
Figure 8. Integration of situational information: observed data from Trope and Gaunt (2000, Experiment 3) and simulation results (idle learning rate = 0.33; loaded = 0.18). The human data are from Table 3 in "Processing alternative explanations of behavior: correction or integration?" by Y Trope & R Gaunt, 2000, Journal of Personality and Social Psychology, 79, pp. 344-354. Copyright 2000 by the American Psychological Association.
Simulation. Table 7 provides a schematic description of the learning history reflecting Experiment 3 by Trope and Gaunt (2000). As can be seen, the distinction between specific and general requirements was implemented in the previous learning history that the participants brought with them. In this learning story, it was assumed that teaching assistants typically make rigorous assessments, just like professors, even when professors have more experience (i.e., more learning tests). In contrast, general study instructions were followed less often (i.e. more severe marking was usually given, but sometimes a more lenient marking was given). Consequently, a specific requirement from the professor should lead to stronger actor→attribute connections than a general requirement from academic authority. (Different trial rates will produce similar simulation results as long as they maintain a similar severe-to-mild ratio.)
The three demand conditions were then simulated in separate simulation runs, assuming two trials per condition. Cognitive load was simulated by reducing the learning rate to 50% of the original value under the three question conditions. Contrary to some previous simulations, we expected more trials and less decrease in learning rate because the information was not a simple adjective or behavior implying a trait, but more elaborate behavioral scenarios that are likely to take a little longer and pay more attention to the process. Finally, the motion ends were measured as before by examining the actor node and reading the motion nodes.
simulation results. The simulation was run with 50 participants for each condition, with different random orders for each participant. The learning rate was 0.33 under unstressed conditions and 50% of this norm under stressed conditions and 0.18, respectively. The results presented in
Figure 8 shows that the recurring network largely replicates the results of Trope and Gaunt (2000). The predicted interaction between demand and cognitive load reached significance, F(2.294) = 4.08, p<.05. More importantly, as predicted, demand specificity had a different effect under cognitive load than under no load. Although all demand conditions resulted in discounting, t(98) = 4.78 to 9.74, p < 0.001, the general and specific requirements did not differ under idle conditions, t(98) < 1, ns. In contrast, underloaded general requirements led to less discounted conclusions than specific requirements, t(98) = 3.03, p<0.01.
How was discounting achieved in the simulation? This was due to the principle of competition. Remember that in the past learning history a strong actor→attribute link between professor and administration has been built. When these actors (i.e., their demands) are present, their strong connection competes with the wizard's actor->pull connection. As a result, the assistant attribute connection is no longer increased, reflecting a discount compared to a non-required condition. It is therefore important to recognize that with this approach, observers who are aware of the situational demands never make an attribute inference that is later reduced, as Gilbertand Malone's (1995) two-stage model would predict. On the contrary: from the beginning of the calculation of the parallel connection, the derivation of the actor's attributes is actually prevented by the situational requirement.
In the simulation, how was the crucial difference in discounting of general claims (of the administration) achieved under idle and high load conditions? The answer ist quite easy. Due to the increased processing (hence faster learning rate) at rest, the competition was more exercised, so even lower aggregate demand led to a blocking of corresponding inferences at rest.
VAN OVERWALLE & LABIOUSE
Table 7. Integration of situational information (simulation 6)
Teaching Assistant, University Administration Professor Strict Lenient
Practical Experience in History#5 Teaching Assistant 1 0 0 1 0#10 Professor 0 0 1 1 0#8 University Council 0 1 0 1 0#2 University Council 0 1 0 0 1
Condition 1: No Question #2 No Question 1 0 0 1 0
Condition 2: General Question #2 University Administration 1 1 0 1 0
Condition 3: Specific Question #2 Professor 1 0 1 1 0
Test Merkmale Lehrassistent 1 0 0 ? -?
Note: Schematic representation of prior knowledge acquisition and experimental design by Trope and Gaunt (2000, Experiment 1), cell inputs indicate external activation; # = number of attempts. All trials during the learning curve were presented in random order for each run. The simulation was performed separately for each condition. In the load condition, the learning rate was reduced to 50% in conditions 1 to 3.
This was less the case in the stress state, where there was a slower learning rate and therefore less competition. The same principles used in our simulation also made it possible to reproduce the results of the other experiments by Trope and Gaunt (2000).
Simulation 7: Discount and sample size
An important assumption in the previous simulation was that stronger or more pronounced competing situational factors prevent inferences about the actor. This assumption is based on the combination of the principles of appropriation (to develop a strong situation → attribute connections during prior learning) and competition (to cause discounting of the actor → attribute connections). But is there more direct evidence for this assumption?
The combination of acquisition and competition principles provides an interesting test case to distinguish our approach from previous algebraic models of attribution (Cheng & Novick, 1992; Försterling, 1992) and impression formation (Anderson, 1981; Hogarth & Einhorn, 1992). . These models would not predict that the increased frequency of an alternative actor or situational factors alone would result in a larger discount to a target actor (Van Overwalle & Van Rooy, 2001a). Alternative connectionist models also fail to make this prediction: the tensor product model of Kashima and Kerekes (1994) because it lacks the competitive property, and the constraint-Safitis model of Kunda and Thagard (1996) because it lacks the acquisition property.
To test this prediction, Van Overwalle (2001) combined differences in sample size with discounting. In particular, by increasing the frequency of an actor's behavior, conclusions were increased, and it was expected that this would lead to greater devaluation of conclusions by a target actor. Participants read several stories, each describing a competing actor exhibiting a specific behavior (e.g., "Stephan solved one or five questions on a quiz"). The competing actor showed this behavior only once (i.e. small size) or five times (large size). Then, regardless of the condition, participants read five descriptions in which the competing actor exhibited the same behavior with a new target actor (e.g., "Stephan and Walter teamed up to solve five more questions"). After receiving this information, they had to evaluate the characteristics of the two actors. In our example, they had to rate how intelligent each participant was on an 11-point scale from 1 (not at all intelligent) to 10 (very intelligent); Van Overwalle, 2001b. Consistent with our connectionist prediction, but in contrast to previous models, the results showed that a larger frequency (i.e., sample size) of the competing actor not only led to higher inferences about the implicit attribute of the competing actor, but also
This results in significantly more discounting of these character trait conclusions for the new target actor (see Figure 9). In other words, if Stephan solves more questions, he is considered more intelligent, while Walter is considered less intelligent. These trait derivation results reproduced similar results for purely causal attributions after the same combined manipulation of sample size and competition (Van Overwalle & Van Rooy, 2001a).
Simulation. Table 8 shows the simulation design for this experiment (Van Overwalle, 2001). Because only one type of behavior was described in each of the stories, only one trait node was reported (i.e., there were no behaviors suggesting the opposite trait). As can be seen, the competing actor first displayed keyword-suggestive behavior alone and then together with the target actor. The key difference between the conditions is the sample size, or the number of times (one or five) that the competing actor engages in the behavior alone.
simulation results. The simulation was carried out with 50 participants and the test order was determined. Each sample size condition was run in a separate simulation. Figure 9 shows the results for a learning rate of 0.13 (and 0.08 for the competitive actor). As can be seen, the simulation reproduced the empirical data. An ANOVA with sample size as a factor between subjects and a score (target vs. competitor) as repeated measures showed that the predicted interaction was significant, F(1,196) = 95.38, p<0.001. With a larger sample size, the simulation shows stronger conclusions from the competing actor, t(98) = 6.63, p<0.001, and at the same time weaker conclusions from the target actor, t(98) = 6.60, p<0.001. Thus, the simulation supports the unique prediction of the linkage approach that the more frequently other people exhibit the same behavior, the less likely observers are to believe that the target actor possesses the corresponding trait. Note when
CONNECTIONS AND PERSONAL CONTRIBUTIONS
Figure 9. Discount and sample size: observed data from VanOverwalle (2001) and simulation results (overall learning rate = 0.13, for competing actors = 0.08).
The order of the trials is randomized and not blocked by conditions (as in the simulation). The network nonetheless predicts discounting, albeit to a lesser extent, as the competing actor's influence builds up in later trials, thus requiring less elaboration 2
Extensions and further research. As mentioned in the introduction, the competition principle presented here can be extended to explain the influence of covariance information (Kelley, 1967) on inferences. When only a few people behave similarly to the actor (i.e., low consensus), we tend to make stronger person attributions and inferences about corresponding characteristics than when many people show similar behaviors (i.e., high consensus). When the actor exhibits the behavior only under certain circumstances (i.e., high distinctiveness), we tend to make more entity attributions, and thus less consistent character traits, than when the actor behaves the same way under many different circumstances (i.e., low distinctiveness). . Van Overwalle (1997, 2003) showed that manipulating these two covariation methods together leads to characterizations similar to those of Stewart (1965; see also Figure 3), where only simple trait descriptions of a single target were given. These results can be simulated by a similar learning curve as shown for Simulation 1.
Adjust data and model comparisons
The simulations we reported reproduced all empirical data and theoretical predictions fairly well. However, it is possible that this adjustment is due to some procedural choices of the simulations rather than a
more general conceptual validity. The purpose of this section is to show that changes to these choices do not, in general, invalidate our simulations. To this end, we examine a range of topics including local versus distributed encoding of concepts and the specific recurrent network used. In addition, we also discuss how the recurring approach compares to other network models. We do not discuss Busemeyer's (1991) and Hogarth and Einhorn's (1992) algebraic models since they are in fact simplified versions of the delta algorithm used in connectionist models. We will deal with each problem one by one.
The first question is whether the nodes in the autoassociative architecture encode local or distributed functions. As mentioned earlier, local features reflect “symbolic” information; that is, each node represents a concrete concept. In contrast, in distributed coding, a concept is represented by an activation pattern across a series of nodes, none of which reflect a symbolic concept, but rather a sub-symbolic micro-function of it (Thorpe, 1994). We used a localistic coding scheme to facilitate understanding of the processing mechanisms underlying connection formation. However, local coding is far from realistic biologically and psychologically, since it implies that each concept is stored in a single processing unit and, apart from different levels of activation, is always perceived in the same way. In contrast to such a purely local coding scheme, a distributed activation pattern allows noisy or incomplete inputs to receive a reasonable level of activation from previously seen similar inputs (see Smith & DeCoster, 1998) and suffer partial damage. Given the benefits of distributed coding, is it possible to replicate our local simulations with a distributed representation?
To answer this question, we re-run all simulations with a distributed coding scheme that represents each concept (e.g., trait, behavior, situation, and soon).
VAN OVERWALLE & LABIOUSE
Table 8. Rebate as a function of sample size for the competing actor (Simulation 7)
(e.g. intelligent) Competing targets
Condition 1: Small Sample#1 Competitor Actor 1 0 1#5 Target and Competitor Actor 1 1 1
Condition 2: Large Sample #5 Competitor Actor 1 0 1# 5 Target and Competitor Actor 1 1 1
TestConcurrent Actor 1 0 ?Doelacteur 0 1 ?
Note: Schematic representation of Van Overwalle's (2001) experimental design. Cell entries indicate external activation, # = number of attempts. The simulation was performed separately for each condition.
2To simulate that both actors work independently and not together on the quiz, one must assume that each is associated with a different behavior or outcome, each leading to the same smart move. Thus, no competition would arise and no discount would be predicted.
is sent by an activation pattern along five instead of a single node. All simulations were performed with 50 participants and each set of 10 participants received a different random activation pattern for each concept to ensure that the simulation results were generalized across all activation patterns. For each participant and trial, a random activation (i.e., noise) was added to this activation to simulate the incomplete conditions of perceptual coding (see Table 9 for details). The fit to the observed data was measured by calculating the correlation between the observed and simulated means. These correlations are only indicative as the number of means (4 or more) is too small to obtain reliable differences between the correlations. The correlation of the original local simulations is also given for comparison.
As can be seen, all distributed simulations achieved good agreement with the data. In all cases, the pattern of results from the original local simulations was reproduced. This suggests that the underlying principles and mechanisms that we believe are responsible for the large-scale simulation results can be achieved not only in the more constructed context of a local encoding, but also in the more realistic context of a distributed encoding.
In our discussion of the three properties of the Delta algorithm, one direction of the connections was usually responsible for the replication of the phenomena. We focused in particular on the connections as these were oriented from input (usually containing actors and behaviors) to output (containing attribute categories), with the exception of the diffusion attribute in Simulation 5 where this order was reversed. To illustrate that these input-feature connections are of primary theoretical interest, the simulations were performed with a feedforward pattern associator (McClelland & Rumelhart, 1988) consisting only of input-feature feedforward connections (see also ranking of the nodes from left to). right in Tables 2 to 9).
As shown in Table 9, the aFeedforward architecture performed almost as well as the original simulations in most simulations. An important exception was the simulation of recency and primacy effects in serial position weights (Simulation 2). As mentioned earlier, the feedforward network is unable to reproduce the critical finding of recency mitigation in continuous judgments and robust primacy in finite judgments. Furthermore, in Simulation 6 there was an unexpected difference between the general and specific idle demand conditions, although as expected the aggregate discounting under these two conditions was larger than under high load. This confirms that for most phenomena in person perception, one direction of connections in the network was most important. This does not change the fact that the additional side or rear connections play a role, albeit a minor one.
Nonlinear recurring model
We have also previously argued that a recurring model with a linear update-activation algorithm and a single internal update cycle (to capture internal activation of related nodes) is sufficient to reproduce the social phenomena of interest. This is in contrast to other social researchers who used a nonlinear activation update algorithm and many more internal cycles (Read & Montoya, 1999; Smith & DeCoster, 1998). Are these model features necessary or even preferable? To answer this question, we ran all our simulations with a nonlinear activation algorithm and 10 (i.e. 1 external and 9 internal) cycles.
As can be seen in Table 9, most of the simulations were not significantly improved compared to the original simulations, although the nonlinear model provided a reasonable fit. In the simulation of situational information integration (simulation 6), the number of internal cycles had to be reduced from nine to four in order to obtain meaningful results. The reason for this is that the non-linear update slightly dampens the competitor's characteristic when drifting
CONNECTIONS AND PERSONAL CONTRIBUTIONS
Table 9. Accuracy and robustness of the simulations, including alternative coding and models
NO. & Topic Original Simulation Distributed Feedforward Nonlinear Recurring
1 online integration .98 .95 .97 .962a weight - continuous .94 .89 .85b .812b weight - finite .71 .72 < 0b < 0a,b
3 Asymmetric Signals .96 .91 .91 .894 Inconsistent Behavior .96 .96 .92 .825 Assimilation and Contrast .99 1.00 .99 1.006 Situation Integration .97 .96 .93b .
7 discount and size .99 .99 .99 .99
Note: Cell entries are correlations between simulated means (averaged over randomizations) and empirical data. For the distributed coding, we ran 50 "participants" and each concept was represented by five nodes and an activation pattern derived from a normal distribution with M = activation from the original simulation and SD = 0.20 (5 such random patterns for 10" participants) were performed and averaged) and additional noise in each trial, drawn from a normal distribution with M=0 and SD=0.20. For the nonlinear autoassociative model, the parameters were: E = I = Decay = .15 and internal cycles = 9 (McClelland & Rumelhart, 1988) We searched for the best fitting learning parameter for all alternative models. aNumber of internal cycles = 4bPredicted pattern was not reproduced.
too high (greater than +1), reverting to the default cap of +1 to prevent overpricing and competition. This weakening of competition is sometimes avoided by buying fewer internal cycles. However, when simulating serial position weights (simulation 2b), reducing the number of internal cycles to four or even one was not enough to produce a primary effect. The reason is the same, as the nonlinear update drives the mutual reductions of actor and feature activation given aversives (which slowed learning and caused precedence) back to the normal upper limit of +1. Taken together, this suggests that the linear activation update algorithm with a single internal cycle is sufficient to simulate many impression formation phenomena.3 This should come as no surprise. In recurring simulations of other problems, such as semantic concept formation, multiple internal cycles were useful to perform "cleanups" on the network so that when a perceptual input was activated (e.g., hearing the word "cat"), the activations of the It emerged a associated semantic concept. finally, one settled for "attraction-gate" representations that had a predetermined conceptual meaning (e.g. McLeod et al., 1998, pp. 145-148). Such a distinction between perception and concept level was not made here, so several internal cycles had no real function.
Parallel constraint satisfaction model
We can summarize Kunda and Thagard's (1996) parallel constraint satisfaction network model. Since this model does not have a learning algorithm, it does not have an acquisition property and therefore cannot reproduce any of the simulations we have presented. We don't see an opportunity to change this model unless they are major changes that will surely change the model dramatically.
Tensor Product Models
The tensor product model is an important alternative connectionist approach to the formation and modification of person and group impressions (Kashima & Kerekes, 1994; Kashima et al., 2000). An important difference from our recurring model is that the tensor product model uses a Hebbian learning algorithm. This form of learning has the decisive disadvantage that there is no competitive feature. Therefore, social phenomena such as contrast, situation correction, and discounting (simulations 5 to 7) explained by this property cannot be simulated with this model, at least not without additional assumptions.
This model requires the ad hoc assumption of different contextual representations before and after a judgment (Kashima & Kerekes, 1994). This assumption was not necessary in our simulations (see simulation 2). For all other phenomena we have simulated, it seems that the tensor product model can simulate most of them, although we are unsure about the diffusion property (simulation 4). Of course, we have no idea how well the tensor product model fits the data and whether it works as well as the recurrent model.
In this article, we have provided an overview of some key insights into impression formation and how they can be explained in a coherent framework. This connectionist perspective offers a new perspective on how information is encoded, how it is structured and activated, and how it can be accessed and used for social judgments. This view differs from previous theories of impression formation, which relied on metaphors such as algebraic arithmetic (Anderson, 1981; Busemeyer, 1991; Hogarth & Einhorn, 1992), phased integration of information (Gilbert, 1989), or the propagation of activation and satisfaction networks with fixed weights (Kunda & Thagard, 1996; Read & Marcus-Newhall, 1993; Shultz & Lepper, 1996). The problem is that these various metaphors provide a rather inflexible, incomplete and fragmentary description of people's perceptual mechanisms.
In contrast, the connectionist approach proposed in this article, although based on the same general autoassociative architecture and processing algorithm, has been used in such a way that it can be applied to a wide variety of impression-forming phenomena. Furthermore, we have shown that this model offers an alternative interpretation to previous algebraic models (Anderson, 1981; Busemeyer, 1991; Hogarth & Einhorn, 1992). In addition, this model can also take into account the learning of social knowledge structures. This includes not only the episodic relationship between actors and their traits and behaviors (Hamilton et al., 1980), but also the more enduring semantic knowledge relating behaviors to traits (Skowronski & Carlston, 1987, 1989). Therefore, this approach can potentially be used to study the development of the structures underlying social knowledge in infants and children.
Eine Grundannahme in unseren Simulationen der Entwicklung semantischer Merkmalsbedeutung ist, dass Merkmale als Prototypen betrachtet werden, mit starken Assoziationen zwischen bestimmten Charakteren (Verhaltensweisen) und Kategorien (Merkmalen), die aus der häufigeren Exposition gegenüber Verhaltensmerkmalspaaren entstehen. Diese Annahme mag fragwürdig erscheinen, da Verhaltensweisen oder Merkmale hoch sind
VAN OVERWALLE & LABIOUSE
3The simulation results of Smith and DeCoster (1998, simulations 1 to 3) could also be obtained with linear activation update.
Also, prototypical feature categories are very rare and may not have been seen before. Because how often are we confronted with someone who is extremely honest or dishonest? Thus, one could argue that these prototypes of extreme or idealized properties are not simply recalled from memory in the formation of judgments, but are constructed as needed. Contrary to this notion, however, research has shown that exceptional traits that have not been previously observed are judged more atypically and categorized more quickly without being categorized than exceptional traits that have been previously observed (Nosofsky, 1991). Therefore, a limited examination of exceptional cases is important in order to draw extreme conclusions about properties; otherwise, one might be tempted to classify extraordinary specimens into different categories, such as UFOs or heroes, which normal ones do not belong to. Also note that in our simulations (e.g. Simulation 3) it was assumed that an extreme conclusion resulted not only from a higher frequency of behavior (actually, the frequencies corresponded to neutral behavior) but also as behavior with a broader spectrum. configuration functions. Thus, not only the frequency of the features, but also the configuration of many features (some of which are typical and others extreme) was sufficient to evoke extreme conclusions.
We have focused largely on the model as a learning tool, that is, as a mechanism for associating patterns that reflect social concepts using very basic learning processes. A major advantage of a connectionist perspective is that complex social reasoning and learning can be achieved by assembling a set of simple interconnected elements that greatly improve the computational power of the network, and by adjusting the connection weights using the delta learning algorithm be gradually adjusted. We have shown that this learning algorithm leads to a number of novel properties, including the acquisition property responsible for sample size effects, the competitive property responsible for discounting, and the diffusion principle responsible for higher recall of inconsistent information. These properties can explain most of our social judgment and behavioral simulations. In contrast, introductory textbooks on the autoassociator (e.g., McClelland & Rumelhart, 1988; McLeod et al., 1998) emphasize the autoassociator's other capabilities, including its content-addressable memory and pattern-completion ability (see also Smith & DeCoster, 1998) and 1998. the error and noise tolerance.
What are the implications of this work for theories of impression formation? The main contribution of this article is that a variety of phenomena have been simulated using the same general network model (several).
only in the learning rate parameter and in the learning history), suggesting that these phenomena are based on the same basic information processing principles, at least during early processing. Hopefully, providing a common framework for these diverse phenomena will lead to further research into and expand new areas of social psychology that are usually considered too diverse to be summarized under a single theoretical heading. In addition, this model can not only consider previous empirical data, but also generate new hypotheses that can be tested in a classical experimental environment. We briefly discuss some potential problems and research questions arising from this model.
knowledge acquisition. To what extent is the assumed learning history in our simulations correct? What mechanisms and architectural considerations are necessary to maintain the knowledge base of the network? How is prior knowledge (property) exchanged with new (behavioral) information? One of the earlier proposals is that semantic behavioral trait associations stored in semantic memory are used spontaneously when new behavioral information is received. This suggestion is consistent with most spontaneous conclusions in research (see Uleman, 1999). Answers to these questions may also be obtained through laboratory replications of the putative learning trajectories, which should reveal equivalence with participants' prior knowledge and similar implications for the conclusions we have discussed here.
Automatic versus conscious thinking. Our approach does not clearly and explicitly distinguish between automatic and intentional processing, or between implicit and explicit processing. Adjusting the learning rate to a lower or higher (default) level could very often simulate this distinction, suggesting that automatic and conscious processing relies in part on slow or superficial and fast and deep learning of information. This different level of learning leads to different emphasis on, for example, previous information compared to new information and can lead to different judgments. Some researchers (e.g. Smith & DeCoster, 2000) have proposed a distinction between two processing modes: a slow-learning (connective) pattern-completion mode, and a more strenuous (symbolic) mode in which rules and inference are explicitly represented symbolically. Other theorists have suggested, consistent with our approach, that such a sharp distinction is not necessary and that many social judgments—although differing in content—may have the same underlying process (e.g., in the context of a singular connectivity network, there are differences) . between superficial and deep learning possible through the assumption that "explicit, conscious knowledge ... contains memory traces of higher quality than tacit knowledge" (Cleeremans)
CONNECTIONS AND PERSONAL CONTRIBUTIONS
& Jiménez, 2002, p. 21). This approach has also been used to simulate differences between heuristic and central processes involved in attitude change (Van Overwalle & Siebler, 2002) and between implicit and explicit reasoning (Kinder & Shanks, 2001).
heuristics. Although heuristics are commonly viewed as rules of thumb that go beyond logical reasoning and often lead to biased judgments (Kahneman, Slovic, & Tversky, 1982), we argue that they do provide insight—as a connecting element—into how the brain works. For example, the availability heuristic, used to explain why many judgments are influenced by information about facts and arguments that was recently or often available in memory, can also be viewed from a connectionist framework as information that has been recently prepared or activated and automatically transferred to other related concepts that influence judgments about them (as we saw in the assimilation sim). Of course, this does not rule out the possibility that subjective experiences that may be associated with the activation of memory traces, such as For example, the ease with which a given number of samples can be retrieved has a further impact on how people use the activated information in further judgments (Schwarz et al., 1991). ). Furthermore, the representativeness heuristic, which explains why categorization is often driven by the similarity between concepts rather than basic statistical numbers, can actually reflect the strength of the relationships between a category and its members (as shown in simulation of the category) . asymmetry of abilities and morally conditioned behavior). Finally, the anchoring and adjustment heuristic originally proposed to explain why judgments often refer to an initial anchor can be understood simply as a property of the delta learning algorithm, in which weight adjustments due to the larger weight initially (i.e., during anchoring ) are often stronger errors in the network, while subsequent adjustments become smaller and smaller as the error gets smaller. This approach may also explain poor adaptations in later stages of learning due to decreased cognitive effort during situational information integration.
Limitations and Future Directions
Inevitably, given the breadth of impression formation, we could include many other interesting insights and phenomena. Perhaps the most interesting area left out is group processes. Connectionist modeling may well help explain how group identity emerges, how perceptions of group homogeneity are changing, how emphasis on correlated traits is reinforced, and how illusory correlations and unrealistic negative minority stereotypes are reinforced. groups arise. These questions are addressed in Van Rooy et al. (2003) using the same model as ours. This app
Applications merely reflect our current thinking and will almost certainly be replaced by improved models in the future. However, we believe that the core of the approach proposed here remains.
While we have attempted to show that a connectionist framework can potentially provide a parsimonious account of a variety of impression-forming phenomena, we do not suggest that this is the only valid way to model social-cognitive phenomena. Rather, we advocate a multi-view position in which connectionism plays a key role but coexists with other views. We believe that strict neurological reductionism is untenable, particularly in the fields of personality and social psychology, where it is difficult to develop a connectionist model for high-level abstract concepts such as 'need for closure' or 'prejudice', 'close relationships ', 'motivation' and the like.
These limitations suggest a number of possible directions for expanding the connectionist approach. First, a crucial improvement to our recurring network could be the inclusion of hidden layers (McClelland & Rumelhart, 1988, pp. 121-126), possibly with raw node coding (e.g. O'Reilly & Rudy, 2001) or ex-Emplar nodes ( e.g., Kruschke & Johansen, 1999), potentially increasing their performance and capacity, for example to cope with nonlinear interactions.
Second, a more modular architecture will almost certainly be required to ensure the model better fits empirical data. For example, a serious limitation of most connectionist models is known as "catastrophic bias" (McCloskey & Cohen, 1989; Ratcliff, 1990; see French, 1999 for a review), which is the tendency of neural networks to grow abruptly and rapidly forget. complete, pre-enforced learned information in the presence of new inputs. Although catastrophic interferences have been observed when observers process new information (see simulation 1), they are untenable for a realistic model of long-term social-cognitive processes in which prior knowledge, such as stereotypes, often resist change in the presence of new information. Information. In response to such observations, it has been suggested that to solve this problem, the brain has evolved a dual hippocampal-neocortical memory system in which new (mainly episodic) information is processed in the hippocampus and the old (mainly semantic) information is stored and consolidated. in the neocortex (McClelland, McNaughton & O'Reilly, 1995; Smith & DeCoster, 2000). Several modelers (Ans & Rousset, 1997; French, 1997) have proposed modular connectionist architectures that mimic this dual memory system, with one subsystem dedicated to the rapid learning of unexpected and new information and the construction of episodic memory traces, and the other subsystem dedicated to it. for slow incremental learning of the statistical regularities of the environment and incremental consolidation of the information learned in the first subsystem.
VAN OVERWALLE & LABIOUSE
There is substantial evidence for brain modularity, particularly the complementary learning functions of the hippocampus and neocortical structures (McClelland et al., 1995), the dominant role of the amygdala in social judgment and emotion perception (Adolphs, Tranel & Damasio & Damasio), and so on. A dual memory representation opens up the intriguing possibility that both old feature knowledge and new feature reasoning can coexist in memory for a limited time. It strikes us that the next step in connectionist modeling of social cognition will be to explore connectionist architectures constructed from separate but complementary systems.
Third, as a special case of modularization, there will eventually be a need to incorporate factors such as attention, awareness, and motivation that are important to social cognition into an improved model. Currently, attentional aspects of human information processing are not part of our network dynamics (variations were simply hand-coded as learning differences), which focus almost exclusively on learning and pattern association. However, there are recent developments that shed light on how an attention switching mechanism can be implemented. O'Reilly and Munakata (2000) proposed a network model for attention and motivation based on the idea that a specialized module in the prefrontal cortex would be able to actively maintain a rapidly updateable activation, making that module persistent from top to bottom biased influence on processing elsewhere in the system. These actively maintained frontal cortex representations can guide behavior and judgment based on goals, motivation, and other types of internal compulsions.
Connectionist modeling of personal impressions fits seamlessly into an integrative multilevel analysis of human behavior (Cacioppo et al., 2000). Since cognition is inherently social, at some point connectionism will have to start incorporating social constraints into its models. On the other hand, social psychology needs to pay more attention to the biological basis of social behavior. Social and biological approaches to cognition can therefore be viewed as complementary efforts with the common goal of achieving a clearer and deeper understanding of human behavior. We hope that connection-oriented narratives of social cognition will provide a common ground for this investigation.
Adolphs, R & Damasio, A (2001). The interaction between affect and cognition: a neurobiological perspective. In J.P. Forgas (ed.),
Manual of Affect and Social Cognition (pp. 27-49). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Adolphs, R., Tranel, D. & Damasio, A. (1998). Human amygdala in social judgement. Nature, 393, 470-474.
Allison T, Puce A & McCarthy G (2000). Social perception through visual cues: STS regions roll. Trends in Cognitive Science, 4, 267-278.
Anderson, J.R. (1976). language, memory and thinking. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Anderson, N.H. (1979). Serial position curves during impression formation. Journal of Experimental Psychology, 97, 8-12.
Anderson, N.H. (1981). The basis of information integration theory. New York: Academic.
Anderson, N.H. & Farkas, A.J. (1973). New light in the right order affects setting change. Journal of Personality and Social Psychology, 28, 88-93.
Ans, B. & Rousset, S. (1997). Avoid catastrophic forgetting by connecting two resonant neural networks. Academy of Biological Sciences, 320, 989-997.
Asch, SE (1946). form impressions of personality. Journal of Abnormal and Social Psychology, 41, 258-290.
Asch, SE, & Zukier, H. (1984). think of people Journal of Personality and Social Psychology, 46, 1230-1240.
Baker, AG, Berbier, MW, & Vallée-Tourangeau, F. (1989). Assessing a 2 × 2 contingency table: sequential processing and the learning curve. The Quarterly Journal of Experimental Psychology, 41B, 65-97.
Bargh, J.A. & Thein, R.D. (1985). Accessibility of individual constructs, personal memory, and the memory-judgement relationship: the case of information overload. Journal of Personality and Social Psychology, 49, 1129-1146.
Betsch, T., Plessner, H., Schwieren, C. & Gütig, R. (2001). I like it, but I don't know why: a value-based approach to implicit attitude formation. Bulletin on Personality and Social Psychology, 27, 242-253.
Busemeyer, J.R. (1991). Intuitive statistical estimation. In N. Anderson (ed.), Contributions to Information Integration Theory: Volume 1. Cognition (pp. 189-215). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Busemeyer, JR, & Myung, I.J. (1988). A new method to study prototypical learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 3-11.
Cacioppo, JT, Berntson, GG, Sheridan, JF, and McClintock, MK (2000). Multilevel integrative analyzes of human behavior: social neuroscience and the complementarity of social and biological approaches. Psychological Bulletin, 126, 829-843.
Cantor, N. & Mischel, W. (1977). Traits as prototypes: affects recognition memory. Journal of Personality and Social Psychology, 35, 38-48.
Chapman, G.B. & Robbins, S.J. (1990). Cue interaction in human contingency assessment. Memory and Knowledge, 18, 537-545.
Cheng, P.W., & Novick, L.R. (1992). Sam variation in natural causal induction. Psychological Review, 99, 365-382.
Chun, WY, Spiegel, S & Kruglanski, AW (2002). Assimilative behavioral identification can also be resource-dependent: the unimodel perspective on personal attribution levels. Journal of Personality and Social Psychology, 83, 542-555.
Cleeremans, A. & Jiménez, L. (2002). Implicit learning and understanding: Et graderet, dynamic perspective. I R M Frans & A Cleeremans (eds.), Implicit learning and awareness: an empirical, philosophical, and computational consensus in emergence (pp. 1-40). East Sussex, England: Psychology Press.
Dreben, E.K., Fiske, ST and Hastie, R. (1979). The independence of rating and item information: Effects of impression and recall order on behavioral impression formation. Journal of Personality and Social Psychology, 37, 1758-1768.
Eagly, A.H. & Chaiken, S. (1993). The psychology of attitudes. San Diego, California: Harcourt Brace.
CONNECTIONS AND PERSONAL CONTRIBUTIONS
Ebbesen, E.B. & Bowers, R.J. (1974). Ratio of risky to conservative arguments in a group discussion and an election change. Journal of Personality and Social Psychology, 29, 316-327.
Fiedler, K. (1996). Explanation and simulation of judgment bias as an aggregation phenomenon in a multi-signal probabilistic environment. Journal of Personality and Social Psychology, 103, 193-214.
Fiedler, K., Walther, E. & Nickel, S. (1999). The Automatic Verification of Social Hypotheses: Stereotyping and the Power of Sample Size. Journal of Personality and Social Psychology, 77, 5-18.
Forsterling, F. (1992). The Kelley model as an analogy of variance analysis: how far can it go? Journal of Experimental Social Psychology, 28, 475-490.
French, R (1997). Pseudo-recurrent connectivity networks: An approach to the vulnerability-stability dilemma. Connection Science, 9, 353-379.
French, R.M. (1999). Catastrophically forgotten in connectivity networks. Trends in Cognitive Science, 3, 128-135.
Freund, T., Kruglanski, A.W. & Shpitzajzen, A. (1985). Freezing and Unfreezing Affective Primacy: Implications of Structural Need and Fear of Disability. Bulletin of Personality and Social Psychology, 11, 479-487.
Fyock, J. & Stangor, C. (1994). The role of memory impairment in maintaining stereotypes. British Journal of Social Psychology, 33, 331-343.
Gannon, KM, Skowronski, JJ, and Betz, AL (1994). Depressive fervor in processing social information: Implications for ordering effects in impressions and for social memory. Social Cognition, 12, 263-280.
Gilbert, D.T. (1989). Thinking of others easily: automatic components of the social reasoning process. In J. S. Uleman & J. A. Bargh (eds.), Unwanted Thoughts: Limits of Awareness, Intention, and Control (pp. 189–211). New York: Guilford.
Gilbert, D.T. & Malone, P.S. (1995). bias in correspondence. Psychological Bulletin, 117, 21-38.
Gilbert, D.T., Pelham, B.W. & Krull, D.S. (1988). On Cognitive Busyness: When People Observers Meet Observed People. Journal of Personality and Social Psychology, 54, 733-740.
Gluck, MA, & Bower, GH (1988). From conditioning to categorization: an adaptive network model. Journal of Experimental Psychology: General, 117, 227-247.
Hamilton, DL, Driscoll, DM, & Worth, LT (1989). Cognitive organization of impressions: Effects of incongruence in complex representations. Journal of Personality and Social Psychology, 56, 925-939.
Hamilton, D.L., Katz, L.B., Leirer, V.O. (1980). Cognitive representation of personality impressions: organizational processes in the formation of first impressions. Journal of Personality and Social Psychology, 39, 1050-1063.
Hansen, R.D. & Hall, C.A. (1985). Depreciate and reinforce facilitating and inhibiting forces: the winner takes all. Journal of Personality and Social Psychology, 49, 1482-1493.
Hastie, R. (1980). Memory for behavioral information that confirms or refutes a personality impression. In R Hastie, TM Ostrom, EB Ebbesen, RS Wyer, DL Hamilton, and DE Carlston (eds.), Personal memory: the cognitive basis of social cognition (pp. 155-177). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Hastie, R. & Kumar, P.A. (1979). Personal memory: Personality traits as memory ordering principles for behavior. Journal of Personality and Social Psychology, 37, 25-38.
Heaton, A.W. & Kruglanski, A.W. (1991). Personal perceptions of introverts and extroverts under time pressure: effects of closure needs. Bulletin on Personality and Social Psychology, 17, 161-165.
Hintzmann, DL (1986). "Skema abstraction" in a multitrack memory model. Psychological Review, 93, 411-428.
Hogarth, R.M. & Einhorn, H.J. (1992). Belief Update Sequential Effects: Belief Modification Model. Cognitive Psychology, 24, 1-55.
Ito, T.A. and Cacioppo, J.T. (2001). Affects and attitudes: a social neuroscientific approach. In J.P. Forgas (ed.), Handbook of Affect
en sociale kennis (S. 50-74). Mahwah, NJ: LawrenceErlbaum Associates, Inc.
Kahneman, D., Slovic, P. & Tversky, A. (1982). Judging under uncertainty: heuristics and biases. Cambridge, UK: Cambridge University Press.
Kashima, Y. & Kerekes, A.R.Z. (1994). A distributed memory model for averaging phenomena in the formation of personal impressions. Journal of Experimental Social Psychology, 30, 407-455.
Kashima, Y., Woolcock, J. and Kashima, E.S. (2000). Group Impressions as Dynamic Configurations: The Tensor Product Model of Group Impression Formation and Change. Psychological Review, 107, 914-942.
Kelley, HH (1967). Attribution in social psychology. Nebraska Symposium on Motivation, 15, 192-238.
Kinder, A. & Shanks, D.R. (2001). Amnesia and the distinction between declarative and non-declarative: a recurring network model of classification, detection, and retry preparation. Journal of Cognitive Neuroscience, 13, 648-669.
Kruglanski, A.W. & Freund, T. (1983). Freezing and unfreezing lay inferences: implications for impression primacy, ethnic stereotyping, and numerical anchoring. Journal of Experimental Social Psychology, 19, 448-468.
Kruglanski, A.W., Schwartz, S.M., Maides, S. & Hamel, I.Z. (1978). Covariation, discounting and augmentation: to clarify attribution principles. Journal of Personality, 76, 176-189.
Kruschke, J.K. & Johansen, M.K. (1999). A model of probabilistic category learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 1083-1119.
Kunda, Z. & Thagard, P. (1996). Impression formation through stereotypes, traits, and behavior: A parallel constraint satisfaction theory. Psychological Review, 103, 284-308.
LaBerge, D. (1997). Attention, Consciousness and the Triangular Cycle. Consciousness and Knowledge, 6, 149-181.
LaBerge, D. (2000). network of attention. In M.S. Gazzaniga (ed.), The New Cognitive Neuroscience (pp. 711-724). Cambridge, MA: MIT Press.
Lupfer, MB, Weeks, M & Dupuis, S (2000). How common is negativity bias in ratings based on numbers? Bulletin of Personality and Social Psychology, 26, 1353-1366.
Macrae, C.N., Hewstone, M. & Griffiths, R.J. (1993). Loading and storing processing for stereotype-based information. European Journal of Social Psychology, 23, 77-87.
Manis, M., Dovalina, I., Avis, NE, & Cardoze, S. (1980). Base interest rates can affect individual forecasts. Journal of Personality and Social Psychology, 38, 231-248.
McClelland, J., McNaughton, B. & O'Reilly, R. (1995). Why complementary learning systems exist in the hippocampus and neocortex: insights from the successes and failures of connectivity models for learning and memory. Psychological Review, 102, 419-457.
McClelland, JL, & Rumelhart, DE (1985). Distributed memory and the representation of general and specific information. Journal of Experimental Psychology, 114, 159-188.
McClelland, JM, & Rumelhart, DE (1988). Studies in distributed pair allele processing: a handbook of models, programs, and exercises. Cambridge, MA: Bradford.
McCloskey, M. & Cohen, N.J. (1989). Catastrophic interference in connectionist networks: the sequential learning problem. The Psychology of Learning and Motivation, 24, 109-165.
McLeod, P., Plunkett, K. & Rolls, ET. (1998). Introduction to connectionist modeling of cognitive processes. Oxford, England: Oxford University Press.
Medin, D.L. & Schaffer, M.M. (1978). Context theory of classification learning. Psychological Review, 85, 207-238.
Moskowitz, G.B. & Skurnik, I.W. (1999). Contrasting effects according to prime type: property and example primes initiate processing strategies that differ in the way accessible constructs are used. Journal of Personality and Social Psychology, 76, 911-927.
VAN OVERWALLE & LABIOUSE
Nisbett, R.E. & Wilson, T.D. (1977). Telling more than we can know: oral accounts of mental processes. Psychological Review, 84, 231-259.
Nosofsky, R.M. (1986). Attention, similarity, and the relationship between identification and category. Journal of Experimental Psychology: General, 115, 39-57.
Nosofsky, R.M. (1991). Typicality in logically defined categories: example similarity versus rule instantiation. Memory and Knowledge, 19, 131-150.
Nosofsky, RM, Kruschke, JK, and McKinley, SC (1992). Combination of example category representations and coherent learning rules. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 211-233.
O'Reilly, R.C. & Munakata, Y. (2000). Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain. Cambridge, MA: MIT Press.
O'Reilly, R.C. & Rudy, J.W. (2001). Conjunctive representations, learning, and memory: Principles of cortical and hippocampal function. Psychological Review, 108, 311-345.
Pandelaere, M. & Hoorens, V. (2002). The role of behavioral categorization in action frequency estimation processes. Manuscript submitted for publication.
Phelps, EA, O'Connor, KJ, Cunningham, WA, Funayama, S, Gatenby, C, Gore, JC, et al. (2000). Performance on indirect measures of breed scoring predicts activation of the amygdala. Journal of Cognitive Neuroscience, 12, 729-738.
Posner, M.I. (1992). Attention as a cognitive and neural system. Current Directions in Psychological Science, 1, 11-14.
Queller, S & Smith, E (2002). Subtyping versus accounting, stereotyped learning and change: connectionist simulations and empirical evidence. Journal of Personality and Social Psychology, 82, 300-313.
Ratcliff, R (1990). Connectionist models of recognition memory: limitations imposed by learning and forgetting functions. Psychological Review, 97, 285-308.
Read, SJ, & Marcus-Newhall, A. (1993). Explanatory context in social explanations: a parallel distributed treatment account. Journal of Personality and Social Psychology, 65, 429-447.
Read, SJ, & Montoya, JA (1999). An autoassociative model of causal reasoning and learning: Response to Van Overwalle's critique of Read and Marcus-Newhall (1993). Journal of Personality and Social Psychology, 76, 728-742.
Reeder, G.D. (1997). Dispositional reasoning about skills: content and process. Journal of Experimental Social Psychology, 33, 171-189.
Reeder, G.D. & Brewer, M.B. (1979). A schematic model of dispositional attribution in interpersonal perception. Psychological Review, 86, 61-79.
Reeder, G.D. & Fulks, J.L. (1980). When actions speak louder than words: implication schemes and ability attribution. Journal of Experimental Social Psychology, 16, 33-46.
Reeder, GD, & Spores, JM (1983). The attribution of morality. Journal of Personality and Social Psychology, 44, 736-745.
Rescorla, R.A. & Wagner, A.R. (1972). A Theory of Pavlovian Conditioning: Variations in the Effectiveness of Reinforcement and Non-Reinforcement. In AH Black & WF Prokasy (eds.), Classical Conditioning II: Current Research and Theory (pp. 64-98). New York: Appleton-Century-Crofts.
Rosch, E.H. (1978). Principles of categorization. ik EH Rosch & BB Lloyds (eds.), Cognition and categorization (pp. 27-48). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Rosenfield, D. & Stephan, WG (1977). When discounting fails: an unexpected discovery. Memory and Knowledge, 5, 97-102.
Rumelhart, D.E. & McClelland, J.L. (1986). Parallel Distributed Processing: Studies in the Microstructure of Cognition: Vol. 1. Foundations. Cambridge, MA: MIT Press.
Sarle, W.S. (1994). Neural networks and statistical models. Proceedings of the Nineteenth Annual International Conference of the SAS User Group.
Schwarz N, Bless H, Strack F, Klumpp G, Rittenauer-Schatka H & Simons A (1991). Easily retrievable as information: Another look at the availability heuristic. Journal of Personality and Social Psychology, 61, 195-202.
Shanks, DR (1985). Back and forth blocking in human emergency assessment. Quarterly Journal of Experimental Psychology, 37b, 1-21.
Shanks, DR (1987). Acquisition capabilities in emergency assessment. Learning and Motivation, 18, 147–166.
Shanks, DR (1995). Is human learning rational? Quarterly Journal of Experimental Psychology, 48a, 257-279.
Shanks, DR, Lopez, FJ, Darby, RJ, and Dickinson, A. (1996). Disentangling associative and probabilistic contrast theories of human contingency valuation. In DR Shanks, KJ Holyoak & D.L. Medin (ed.), The psychology of learning and motivation (Vol. 34, pp. 265-311). New York: Academic.
Shultz, T. and Lepper, M. (1996). Cognitive dissonance reduction as satisfaction of limitations. Psychological Review, 2, 219-240.
Skowronski, JJ, & Carlston, DE (1987). Social judgment and social memory: The role of stimulus diagnosis in negativity, positivity, and extremity bias. Journal of Personality and Social Psychology, 52, 689-699.
Skowronski, JJ, & Carlston, DE (1989). Negativity and impairment of the extremities in impression formation: a review of explanations. Psychological Bulletin, 105, 131-142.
Skowronski, JJ, & Gannon, K (2000). Raw conditional probabilities are an erroneous index of associative strength: evidence for a single-trait expectancy paradigm. Basic and Applied Social Psychology, 22, 9–18.
Skowronski, JJ, & Welbourne, J (1997). Conditional probability can be an erroneous measure of association strength. Social Cognition, 15, 1-12.
Smith, E.R. (1996). What do cohesion and social psychology offer each other? Journal of Personality and Social Psychology, 70, 893-912.
Smith, ER, & DeCoster, J. (1998). Knowledge Acquisition, Accessibility, and Utilization of Personal Cognition and Stereotyping: Simulation with a Recurring Connection Network. Journal of Personality and Social Psychology, 74, 21-35.
Smith, E.R. & DeCoster, J. (2000). Associative and rule-based processing: a coherent interpretation of two-process models. In S. Chaiken & Y. Trope (eds.), Dual-process theories in social psychology (pp. 323-338). London, UK: Guilford.
Smith, ER, & Zarate, MA (1992). Example-based model of social judgment. Psychological Review, 99, 3-21.
Srull, TK (1981). Personal memory: some tests of associative storage and retrieval models. Journal of Experimental Psychology: Human Learning and Memory, 7, 440-463.
Srull, T.K., Lichtenstein, M. & Rothbart, M. (1985). Associative delay and finding dependent on processor and person. Journal of Experimental Psychology: Learning, Memory and Cognition, 11, 316-345.
Stangor, C. and Duan, C. (1991). Effects of multiple task demands on memory for information about social groups. Journal of Experimental Social Psychology, 27, 357-378.
Stangor, C. and McMillan, D. (1992). Memory for expectantly congruent and expectantly incongruent information: a review of the social and social development literature. Psychological Bulletin, 111, 42-61.
Stapel, D.A., Koomen, W. & van der Pligt, J. (1997). Categories of Category Accessibility: The Impact of the Concept of Trait versus Sample Priming on Person Ratings. Journal of Experimental Social Psychology, 33, 47-76.
Stewart, R. H. (1965). Influence of continuous reaction on the ordering effect in the formation of personality impressions. Journal of Personality and Social Psychology, 1, 161-165.
Thorpe, S. (1994). Localized versus distributed representations. In MA Arbib (ed.), Handbook of Brain Theory and Neural Networks (pp. 949-952). Cambridge, MA: MIT Press.
CONNECTIONS AND PERSONAL CONTRIBUTIONS
Trope, Y. (1986). Identification and reasoning processes of indispositional attribution. Psychological Review, 93, 239-257.
Trope, Y & Gaunt, R (2000). Processing alternative explanations of behavior: correction or integration? Journal of Personality and Social Psychology, 79, 344-354.
Uleman, J.S. (1999). Spontaneous versus intentional inferences in impression formation. In S. Chaiken & Y. Trope (eds.), Dual-process theories in social psychology (pp. 141-160). New York: Guilford.
Van Overwalle, F. (1996). The relationship between the Rescorla-Wagner associative model and the probabilistic joint causality model. Psychologica Belgica, 36, 171-192.
Van Overwalle, F. (1997). Dispositional attributions require a joint application of the methods of difference and agreement. Bulletin of Personality and Social Psychology, 23, 974-980.
Van Overwalle, F. (1998). Causality as Constraint Fulfillment: A Critical and a Forward Linking Alternative. Journal of Personality and Social Psychology, 74, 312-328.
Van Overwalle, F. (2001). Discounting and magnification of dispositional and causal attributions. Manuscript submitted for publication.
Van Overwalle, F. (2003). Capturing dispositional attributes: effects of sample size and covariation. European Journal of Social Psychology, 33, 515-533.
Van Overwalle, F. & Jordans, K. (2002). An adaptive connectionist model of cognitive dissonance. Assessment of Personality and Social Psychology, 3, 204-231.
Van Overwalle, F. & Siebler, F. (2002). A connectionist model of attitude formation and change. Manuscript submitted for publication.
Van Overwalle, F & Van Rooy, D (1998). A coherent approach to causal attribution. In S. J. Read & L. C Miller (eds.), Connectionist models of social thought and behavior (pp. 143–171). New York: Lawrence Erlbaum Associates, Inc.
Van Overwalle, F. & Van Rooy, D. (2001a). How to get others to discount or increase their prices: a coherent account of causal competition. Bulletin of Personality and Social Psychology, 27,1613-1626.
Van Overwalle, F. & Van Rooy, D. (2001b). When more observations are better than fewer: a coherent account of the acquisition of causal power. European Journal of Social Psychology, 31, 155-175.
Van Rooy D, Van Overwalle F, Vanhoomissen T, Labiouse C & French R (2003). A recurring connectionist model of group bias. Psychological Review, 110, 536-563.
Wasserman EA, Kao SF, Van Hamme L, Katagiri M & Young ME (1996). causality and correlation. The Psychology of Learning and Motivation, 34, 207-264.
Wells, G.L. & Ronis, D.L. (1982). Rebate and Increase: Is there something special about the number of causes? Bulletin on Personality and Social Psychology, 8, 566-572.
Wojciszke, B., Brycz, H. & Borkenau, P. (1993). Effects of information content and evaluative extremity on positivity and negativity bias. Journal of Personality and Social Psychology, 64, 327-335.
Das linear self-associative Modell
In an auto-associative network, characteristics and categories or causes and consequences are represented in nodes that are all connected to one another. In this model, information processing takes place in two phases. In the first step the activation of the nodes is calculated and in the second step.
In the e-phase the weights of the connections are updated (see also McClelland & Rumelhart, 1988).
node activation. During the first stage of information processing, each node in the network is activated by external sources. Since all nodes are interconnected, this activation propagates to the network and affects all other nodes there. The activation coming from the other nodes is called internal input. Together with the external input, this internal input determines the final activation pattern of the nodes, which reflects the short-term memory of the network.
In mathematical terms, each node i in the network receives an external input, called exti. In the autoassociative model, each node i also receives the internal input inti, which is the sum of the activations of the other nodes j (denoted by aj) relative to the weight of their connection or
for all j ≠ i. Typical activations and weights are between -1 and +1. The external input and the internal input are then added to the net input, or
E and I represent the extent to which net input is determined by external and internal inputs, respectively. Typically, in a repetitive network, the activation of each node i is updated over a series of cycles until it eventually converges into a stable pattern that reflects the network's short-term memory. According to the linear activation algorithm, the activation update is determined by the following equation:
where D represents a memory gap. In our simulations we used only one internal update cycle and the parameter values D=I=E=1. Under these simplifying assumptions, the final activation of node i is reduced to the sum of the external and internal inputs, or:
weight update. After this first phase, the autoassociative model enters the second learning phase, in which short-term activations are consolidated into long-term weight changes to better reflect and anticipate future external inputs. Essentially, weight changes are caused by the discrepancy between the internal input from the network's last update cycle and the external input from outside sources.
VAN OVERWALLE & LABIOUSE
� ��int ( ) (1)i j ija w
� � � � (2)i i inet E ext I int
� � � � (3)i i ia netto D a
� � � (3')i i ia net ext int
formally expressed in the delta algorithm (McClelland & Rumelhart, 1988, p. 166):
where we is the weight of the connection from node j to i, ε is a learning rate that determines how fast the network learns, and aj is the current final activation of node j.
The presence of a trait or category was typically coded by setting the external input to +1 and −1 for opposite traits or categories (lower values have also been used; see corresponding tables); otherwise, external activation remained at the zero resting level. Composite weights were updated after each run. At the end of each simulation, the score of interest was tested by turning on the external input of the respective nodes and reading the resulting activation of the nodes representing the score of interest (see corresponding tables).
Anderson's Averaging Rule and the Delta Algorithm
This appendix shows that the Delta algorithm converges asymptotically to Anderson's (1981) mean rule under two conditions. First, learning must have reached an asymptote (i.e., after enough trials), and second, the relative weights in Anderson's model can be represented by the relative frequencies of person-trait pairs. Anderson's average rule for impression formation expresses an assessment of a person as:
where ωi represents the weights and si the scale values of the feature.
This proof uses the same logic as Chapman and Robbins (1990) in their proof that the delta algorithm converges to the probabilistic expression for covariance. According to the conventional representation of covariance information, personal impression information can be represented in a two-cell contingency table. Cell a represents all cases where the actor is a central trait as described, while cell b represents all cases where the actor acquires the opposite trait. For simplicity, we use only two property categories, although this proof can easily be extended to other categories.
In a recurring local-encoding connection architecture used in the text, the target persona and attribute categories are each represented by anodes that are assigned adjustable weights. If the target person is present, their counterpart is also present
The node receives an external activation and this activation is propagated to each property node. We assume that the overall activation received on the attribute nodes in (or the internal activation) after priming the person node reflects the person's impression.
According to the Delta algorithm in Equation 4, the weights vi are adjusted proportionally to the error between the actual feature category (represented by its external trigger ext) and the network-predicted feature category (represented by its internal trigger int). If we replace ext in Equation 4 with Anderson's scaling values (s1 for the focus feature and s2 for the opposite feature) and if we use the default activation for aj (which is 1), the following equations can be constructed for the two cells of the contingency table:
The change in overall impression is the sum of equations 6 and 7, weighted for the corresponding frequencies a and b, in the two cells, or:
These adjustments are made asymptotically, that is, until the error between the actual and expected category is zero. This means that the changes at asymptote are zero, or ∆wi = 0. So equation 8 becomes:
Since the internal activation of the traction nodes reflects the impression of traction on the person, this can be paraphrased in Anderson's words as follows:
where f represents the frequency at which a person and a characteristic occur simultaneously. As can be seen, Equation 9 has the same format as Equation 5. This shows that the Delta algorithm predicts a weighted average function at asymptote to make overall impression judgments, where Anderson's weights ω are determined by the frequencies of the people and attributes represented. together.
CONNECTIONS AND PERSONAL CONTRIBUTIONS
ω ω�� �waardering / (5)i i
� � �
� � �
For the a cell: ( ) (6) For the b cell: ( ) (7)
w s intw s int
ε ε� � � � �1 2[ ( )] [ ( )] (8)iw a s int b s int
0 [ ( )] [ ( )][ ] [ ]
[ ] [ ]
a s int b s inta s int b s inta s b s a b int
e e� � � �
� � � �
� � � � � � � �
1 2[ ]/[ ]int a s b s a b� � � � �
�� �impressie / (9)i i if s f
ε� � �( ) (4)ij i i jw ext int a
0 1 2 3 4 5 6 7 8 9
M M M M M I Ik
M M M M
1 2 3 4 5 6 7 82
effect of position on
1st review 2nd review
3rd review, last review
Series position -
1 2 3 40,20
1 2 3 4
Series position -
1 2 3 40,25
1 2 3 4
Simulation of a positive feature. negative feature
Consistently inconsistent Simulation
Positive Prime Negative Prime-Simulation
No tax 6
Don't ask general specific simulation
Competitive Actor Target Actor6.5
Small size, big simulation
What is connectionist model of knowledge in psychology? ›
Connectionist models, also known as Parallel Distributed Processing (PDP) models, are a class of computational models often used to model aspects of human perception, cognition, and behaviour, the learning processes underlying such behaviour, and the storage and retrieval of information from memory.What is an example of connectionist approach? ›
The concepts of supervised and unsupervised learning are defined. Then a single example of the connectionist approach is presented: training a network to learn the past tenses of English verbs.What are the main components of a connectionist model? ›
All connectionist models are composed of two simple concepts: nodes (AKA neurons or units or cells) and weights (AKA connections or synapses).What is the connectionist model of reading? ›
Connectionism emphasizes: (1) a single, rather than dual, mechanism for processing words, and (2) distributed representations and weighted connections between units rather than symbolic rules for mapping letters and sounds. Implications for the teaching of reading are discussed.What are the 3 key concepts of connectionism theory? ›
The connectionism theory consists of three laws. Specifically, the three laws are the law of effect, the law of exercise, and the law of readiness.What are the four processes of the connectionist model? ›
First, it describes basic elements of connectionist models of reading: task orientation, distributed representations, learning, hidden units, and experience.What are the disadvantages of the connectionist model? ›
Some disadvantages include the difficulty in deciphering how ANNs process information, or account for the compositionality of mental representations, and a resultant difficulty explaining phenomena at a higher level.What is an example how is connectionism applied in real life situation? ›
The student has a strong S-R bond between studying and good grades. That is, the student believes that studying leads to good grades. The law of effect is the first of three laws of connectionism.What do connectionist models focus on? ›
Connectionist models are believed to be a step in the direction toward capturing the intrinsic properties of the biological substrate of intelligence, in that they have been inspired by biological neural networks and seem to be closer in form to biological processes.What are the different types of Connectionist models? ›
Generally speaking, however, there are two types of connectionist architectures: feedforward networks, such as the one above, and feedback (or recurrent) networks such as the 3-layered one below. Your browser does not support the video tag. Feedforward networks never contain feedback connections between units.
Which 1 is referred to as connectionist systems? ›
1 Connectionist systems are also sometimes referred to as 'neural networks' (abbreviated to NNs) or 'artificial neural networks' (abbreviated to ANNs).What are the three main components of a model? ›
The three components of models for physical design: the circuit, the architecture and the layout generation.What are the 3 models of reading? ›
According to Browne (1998), there are three major models of reading which are the bottom-up, top-down, and interactive models. effective because the emphasis here is on the letters, recognition of their shapes and reading individual words.What is the purpose of connectionism? ›
Connectionism is a movement in cognitive science that hopes to explain intellectual abilities using artificial neural networks (also known as “neural networks” or “neural nets”).What are the strengths of connectionism? ›
One of the main strengths of connectionism is that the neural network models are not verbally specified but implemented. In this way, they are able to suggest elaborate mechanistic explanations for the structure of cognition and cognitive development.What are the benefits of connectionism? ›
There are several benefits of using a connectionist approach to AI. First, connectionist systems are very flexible and can be used to learn a wide variety of tasks. Second, connectionist systems are scalable, meaning that they can be used to learn very complex tasks by adding more nodes and connections.What are the hidden units in Connectionism? ›
A hidden unit refers to the components comprising the layers of processors between input and output units in a connectionist system.What are the 4 stages of information processing model? ›
According to the information processing theory, there are four main stages of information processing which include attending, encoding, storing, and retrieving.Which of the following is an assumption of a connectionist model? ›
The basic assumption is that candidate answers to a particular problem are in cooperative/competitive interactions and these interactions favor small, five, and tie problems. The theory is implemented as a connectionist model, and simulation data are described that are in good accord with empirical data.What are two disadvantages of using a model? ›
Disadvantages of modelling and simulation
The cost of a simulation model can be high. The cost of running several different simulations may be high. Time may be needed to make sense of the results. People's reactions to the model or simulation might not be realistic or reliable.
What is a major drawback of the neural network modeling? ›
Disadvantages include its "black box" nature, greater computational burden, proneness to overfitting, and the empirical nature of model development. An overview of the features of neural networks and logistic regression is presented, and the advantages and disadvantages of using this modeling technique are discussed.What do connectionist models attempt to simulate? ›
Abstract. A key strength of connectionist modelling is its ability to simulate both intact cognition and the behavioural effects of neural damage.What does a connectionist system involve? ›
Connectionist models consist of a large number of simple processors, or units, with relatively simple input/output functions that resemble those of nerve cells. These units are connected to each other and some also to input or output structures, via a number of connections. These connections have different “weight”.What is connectionist theories? ›
Connectionism theory is based on the principle of active learning and is the result of the work of the American psychologist Edward Thorndike. This work led to Thorndike's Laws. According to these Laws, learning is achieved when an individual is able to form associations between a particular stimulus and a response.What is an example of connectionist AI? ›
One example of connectionist AI is an artificial neural network. Each one contains hundreds of single units, artificial neurons or processing elements. They have a layered format with weights forming connections within the structure. The weights are adjustable parameters.What is connectionist memory theory? ›
a group of theories that hypothesize insight as being encoded by links over symbolizations retained in the mind instead of in the symbolizations themselves. Connectionist designs imply that insights are dispersed instead of being centralized and that they are recalled via spreading activation over such links.What are the four 4 components of successful modeling? ›
Effective modeling involves 4 components to mix/match depending on students and their experience: a clear GOAL, a positive DEMONSTRATION, a chance to PRACTICE, and the opportunity to REFLECT.What are the four components that make up a conceptual model? ›
The conceptual model itself consists of four main components: objectives, inputs (experimental factors), outputs (responses) and model content. Two types of objective inform a modelling project. First there are the modelling objectives, which describe the purpose of the model and modelling project.What are the 4 reading styles? ›
- Skimming. Skimming, sometimes referred to as gist reading, means going through the text to grasp the main idea. ...
- Scanning. Here, the reader quickly scuttles across sentences to get to a particular piece of information. ...
- Intensive Reading. ...
- Extensive reading.
These skills can be placed into four main categories: decoding, fluency, vocabulary, and understanding sentences. These main reading skills make up the bulk of a child's reading ability. Overall, they aim to arm children with the skills to be able to understand the meaning of what they read.
What are the 5 types of reading? ›
- Reading modes. ...
- Scanning. ...
- Skimming. ...
- Extensive reading. ...
- Intensive reading. ...
- Why should you give time to each of the reading modes mentioned above?
Connectionist models are believed to be a step in the direction toward capturing the intrinsic properties of the biological substrate of intelligence, in that they have been inspired by biological neural networks and seem to be closer in form to biological processes.What is the connectionist approach to learning? ›
Connectionism is the philosophy of Edward Thorndike, which says that learning is a product between stimulus and response. A stimulus is something that causes a reaction, while a response is the reaction to a stimulus. The connection between the two is called an S-R bond, or stimulus-response bond.What is the connectionist approach to teaching? ›
In essence, connectionist teachers: have a conscious awareness of connections and relationships, and use mental mathematics to develop agility with this. believe that students of all levels of attainment need to be challenged in mathematics, and have high levels of expectation for all students.What is an example of a connectionist AI? ›
Consider the example of using connectionist AI to decide the fate of a person accused of murder. In that case, people would likely consider it cruel and unjust to rely on AI that way without knowing why the algorithm reached its outcome.What is the difference between classical and connectionist? ›
On the classical account, information is represented by strings of symbols, just as we represent data in computer memory or on pieces of paper. The connectionist claims, on the other hand, that information is stored non-symbolically in the weights, or connection strengths, between the units of a neural net.What do connectionist models rely on? ›
Most recent connectionist models rely on properties of distributed representations. Although distributed representations can be less intuitive, they are attractive in part because they provide a more natural account of the richness and subtlety of the relationships among entities.What are the principles of connectivism? ›
Connectivism is based on the principle that all learning starts with a connection (Siemens, 2004). These connections occur on neural, conceptual, and social levels (Siemens, 2008), and in connectivism, learning is thought to be “the ability to construct and traverse connections” (Downes, 2007).What are the characteristics of connectivism? ›
There are four characteristics of connectivism: autonomy, openness, connectedness, and diversity (Downes, 2010).