The Role of Disclosure in DeFi Markets
Decentralized Finance (DeFi) platforms use self-executing smart contracts to provide financial services and are programmed to automatically post all information on the public blockchain. Notwithstanding this public availability of blockchain information, DeFi platforms also extract public blockchain data and disclose the summarized blockchain information on their Twitter accounts. This paper studies whether and how voluntary disclosure of blockchain information plays a role in the transparent DeFi market. I find that the number of blockchain-related tweets is associated both with an increase in the platform’s Total Value Locked (TVL) and with an increase in the total number of platform users. The relationship between blockchain-related tweets and TVL is strengthened when the tweets have greater information content and when users face higher information processing costs. This suggests that public blockchain transactions are too costly for users to process such that they rely on the platform’s disclosure of blockchain information. Overall, my results show that DeFi platforms can help users process and understand blockchain transactions by summarizing and disclosing them on Twitter.
Document graph representation learning
Much of the data on the Web can be represented in a graph structure, ranging from social and biological to academic and Web page graphs, etc. Graph analysis recently attracts escalating research attention due to its importance and wide applicability. Diverse problems could be formulated as graph tasks, such as text classification and information retrieval. As the primary information is the inherent structure of the graph itself, one promising direction known as the graph representation learning problem is to learn the representation of each node, which could in turn fuel tasks such as node classification, node clustering, and link prediction.
As a specific graph data, documents are usually connected in a graph structure. For example, Google Web pages hyperlink to other related pages, academic papers cite other papers, Facebook user profiles are connected as a social network, news articles with similar tags are linked together, etc. We call such data document graph or document network. To better make sense of the meaning within these text documents, researchers develop neural topic models. By modeling both textual content within documents and connectivity across documents, we can discover more interpretable topics to understand the corpus and better fulfill real-world applications, such as Web page searching, news article classification, academic paper indexing, and friend recommendation based on user profiles, etc. However, traditional topic models explore the content only, ignoring the connectivity. In this dissertation, we aim to develop models for document graph representation learning.
First, we investigate the extension of Auto-Encoders, a family of shallow topic models. Intuitively, connected documents tend to share similar latent topics. Thus, we allow Auto-Encoder to extract topics of the input document and reconstruct its adjacent neighbors. This allows documents in a network to collaboratively learn from one another, such that close neighbors would have similar representations in the topic space. Extensive experiments verify the effectiveness of our proposed model against both graphical and neural baselines.
Second, we focus on dynamic modeling of document networks. In many real-world scenarios, documents are published in a sequence and are associated with timestamps. For example, academic papers published over the years exhibit the development of research topics. To incorporate such temporal information, we introduce a neural topic model aimed at learning unified topic distributions that incorporate both document dynamics and network structure.
Third, we discover that documents are usually associated with authors. For example, news reports have journalists specializing in writing certain type of events, academic papers have authors with expertise in certain research topics, etc. Modeling authorship information could benefit topic modeling, since documents by the same authors tend to reveal similar semantics. This observation also holds for documents published on the same venues. We propose a Variational Graph Author Topic Model for documents to integrate both topic modeling and authorship and venue modeling into a unified framework.
Fourth, most previous topic models treat documents of different lengths uniformly, assuming that each document is sufficiently informative. However, shorter documents may have only a few word co-occurrences, resulting in inferior topic quality. Some other previous works assume that all documents are short, and leverage external auxiliary data, e.g., pretrained word embeddings and document connectivity. Orthogonal to existing works, we remedy this problem within the corpus itself by meta-learning and proposing a Meta-Complement Topic Model, which improves topic quality of short texts by transferring the semantic knowledge learned on long documents to complement semantically limited short texts.
Fifth, we explore the modeling of short texts on the graph. Text embedding models usually rely on word co-occurrences within the documents to learn effective representations. However, short texts with only a few words may influence the learning process. To accurately discover the main topics of these short documents, we propose a new statistical concept, i.e., optimal transport barycenter, to incorporate external knowledge, such as pre-trained word embedding on a large corpus, to improve topic modeling. The proposed model shows state-of-the-art performance.
In this presentation, we will discuss various aspects of semantic data representations, which we broadly group into two categories. First, effective semantic representations focuses on aspects generally related to the capabilities of these representations, such as task performance and interpretability. Next, efficient semantic representations encompasses aspects which generally related to the utilization of these representations, such as their storage size as well as generalizability across multiple tasks.
Our discussion revolves around two primary forms of data, textual data as well as knowledge bases. For textual data representations, we introduce a novel approach that improves efficiency through discarding representations, while limiting the impacts on downstream task effectiveness. For knowledge base representations, we explore a novel measure of node importance in knowledge graphs, and present a heuristic approach for selecting such nodes in large knowledge graphs.
We also discuss the use of semantic representations in real world applications, and propose a novel approach for the cold-start problem when training Large Language Models in the legal domain.
Learning Dynamic Multimodal Networks
In this dissertation, we focus on modeling an important class of networks present in many real-world domains. These networks involve i) attributes of different modalities, also known as multimodal attributes; ii) multimodal attributes that are dynamic captured in the form of time-series, and/or iii) dynamic relationships that evolve with time. We refer to such networks as dynamic multimodal networks.
An example of static networks involving static multimodal attributes are networks of design objects in user interfaces (UI) formed from links between UI screens and their constituent UI elements, where the design objects may be associated with multimodal attributes such as visual UI screen and element images, textual UI descriptions and code. An example of dynamic networks involving dynamic multimodal attributes are networks between companies based on dynamic commercial relationships between companies which are further associated with dynamic multimodal attributes such as time-series of numerical stock prices, time-series of textual news, and time-series of categorical event attributes.
Despite the importance of modeling multimodality and dynamicity in networks, few existing works have addressed the challenges of modeling the diverse characteristics of multimodal network attributes and non-stationarity of dynamic networks and attributes. In this dissertation, we present a series of works that address such challenges of learning both static and dynamic multimodal networks. We also show that our proposed models out-perform state-of-the-art models on prediction and forecasting tasks on multiple real-world datasets.
Essays on stakeholder economy
The dissertation consists of two chapters on stakeholder economy. It looks at how firms interact with the stakeholders, including not only investors, employees, customers, governments, but also the broader community and society at large, and examines how such interactions affect corporate behavior in China and the global setting. The first chapter studies how societal culture shapes firm behavior and growth by analyzing the trade-off of relying on trust in acquiring stakeholder resources, and testing with data on the number of historic Confucian schools surrounding a current firm’s location in China. Companies more exposed to Confucianism have greater social contributions and stakeholder protection, and more business courtesy expenses, patents, and trade credits, which match the five basic virtues of Confucianism: benevolence, righteousness, courteousness, wisdom, and trustworthiness. Our results cannot be explained by other cultural traits and are robust to using the distance to the prototypical Confucian academies in the Song Dynasty and the intensity of rivers in the local region as instrumental variables. The effects are likely to be transmitted via a firm’s interaction with market participants, politicians’ ideology, and board of directors. Stronger Confucianism is associated with greater profitability and growth. Our paper contributes to the literature by providing more granular evidence on how culture affects economic activities through firm-level channels, which have not been systematically explored in the literature.
In the second chapter, we employ a novel firm-level dataset on monetized value of unpriced earnings losses due to climate-related transition risks to study the magnitudes, determinants and consequences of a firm’s carbon earnings risks across different scenarios based on national pledges to Paris Agreement targets and different time horizons. We find carbon earnings risks on average account for about 15 percent of a firm’s total earnings and are largely driven by unobservable industry- and firm-level heterogeneities. We also find that companies with greater carbon earnings risks tend to have more green innovations, discretionary accruals, and outsourced productions. We use the staggered introduction of country-level carbon tax and emission trading system, as well as state-level climate-related disasters as instrumental variables to address potential endogeneity issues. Our findings highlight the importance of accounting for transition risks in a firm’s financial statements. Our work complements the growing climate finance literature on the effect of climate risks on corporate policies by providing more comprehensive evidence on the motivation of corporate reaction, driven by material carbon earnings risks that are reflected on a firms financials.
Fortifying the Seams of Software Systems
A seam in software is a place where two components within a software system meet.
There are more seams in software now than ever before as modern software systems rely extensively on third-party software components, e.g., libraries.
The increasing complexity and interconnectedness of software systems make the reliability of these components and their proper use crucial.
While using software components can make the development process easier, it also introduces risks and challenges due to the interaction between different components.
This dissertation tackles problems associated with the reliability of third-party software components.
Developers write programs that interact with libraries through their Application Programming Interfaces (API).
Analysis of API-using code requires knowledge of an API and its usage constraints.
Hence, we develop techniques to infer and model the usage constraints of APIs.
Next, we apply the insights gleaned from our studies to support bug-finding techniques using static and dynamic analysis.
Then, we look into larger software systems comprising multiple components.
We propose techniques for mining rules to monitor the joint behaviors of apps,
and for exploiting known library vulnerabilities from a project importing a library.
These techniques aim to assist developers to better understand and use third-party components, and to detect weaknesses in the software system before they can be exploited by malicious actors.
Fortifying the seams of software systems
A seam in software is a place where two components within a software system meet. There are more seams in software now than ever before as modern software systems rely extensively on third-party software components, e.g., libraries. Due to the increasing complexity of software systems, understanding and improving the reliability of these components and their use is crucial. While the use of software components eases the development process, it also introduces challenges due to the interaction between the components.
This dissertation tackles problems associated with software reliability when using third-party software components. Developers write programs that interact with libraries through their Application Programming Interfaces (API). Both static and dynamic analysis of API-using code require knowledge of the API and its usage constraints. Hence, we develop techniques to learn and model the usage constraints of APIs. Next, we apply the insights gleaned from our studies to support bug-finding techniques using static and dynamic analysis. Then, we look into larger software systems comprising multiple components. We propose techniques for mining rules to monitor the joint behaviors of apps, and for exploiting known library vulnerabilities from a project importing a library. These techniques aim to assist developers to better understand third-party components, and to detect weaknesses in software systems.
Continual Learning with Neural Networks
Recent years have witnessed tremendous successes of artificial neural networks in many applications, ranging from visual perception to language understanding. However, such achievements have been mostly demonstrated on a large amount of labeled data that is static throughout learning. In contrast, real-world environments are always evolving, where new patterns emerge and the older ones become inactive before reappearing in the future. In this respect, \emph{continual learning} aims to achieve a higher level of intelligence by learning online on a data stream of several tasks. As it turns out, neural networks are not equipped to learn continually: they lack the ability to facilitate knowledge transfer and remember the learned skills. Therefore, this thesis has been dedicated to developing effective continual learning methods and investigating their broader impacts on other research disciplines.
Towards this end, we have made several contributions to facilitate continual learning research. First, we contribute to the classical continual learning framework by analyzing how Batch Normalization affects different replay strategies. We discovered that although Batch Normalization facilitates continual learning, it also hinders the performance of older tasks. We named this the \emph{cross-task normalization phenomenon} and conducted a comprehensive analysis to investigate and alleviate its negative effects.
Then, we developed a novel \emph{fast and slow learning} framework for continual learning based on the \emph{Complementary Learning Systems}~\cite{kumaran2016learning,mcclelland1995there} of human learning. Particularly, the fast and slow learning principle suggests to model continual learning at two levels: general representation learning and learning of individual experience. This principle has been the main tool for us to address the challenges of learning new skills while remembering old knowledge in continual learning. We first realized the fast-and-slow learning principle in Contextual Transformation Networks (CTN) as an efficient and effective online continual learning algorithm. Then, we proposed DualNets, which incorporated representation learning into continual learning and proposed an effective strategy to utilize general representations for better supervised learning. DualNets not only addresses CTN's limitations but is also applicable to general continual learning settings.
Through extensive experiments, our findings suggest that DualNets is an effective and achieved strong results in several challenging continual learning settings, even in the complex scenarios of limited training samples or distribution shifts.
Furthermore, we went beyond the traditional image benchmarks to test the proposed fast-and-slow continual learning framework on the online time series forecasting problem. We proposed Fast and Slow Networks (FSNet) as a radical approach to online time series forecasting by formulating it as a continual learning problem. FSNet leverages and improves upon the fast-and slow learning principle to address two major time series forecasting challenges: fast adaptation to concept drifts and learning of recurring concepts. From experiments with both real and synthetic datasets, we found FSNet's promising capabilities in dealing with concept drifts and recurring patterns.
Finally, we conclude the dissertation with a summary of our contributions and an outline of potential future directions in continual learning research.
The new revenue standard (ASU 2014-09, codified in ASC 606 and ASC 340-40) establishes a comprehensive framework on accounting for contracts with customers and replaces most existing revenue recognition rules. It is an important milestone of moving towards the principles-based accounting standards proposed by SEC in 2003. Using as-reported data from structured filings to construct aggregate accruals that are potentially affected by the new revenue standard (i.e., sales-related accruals), I find that the new revenue standard increases the quality of sales-related accruals, as measured by future cash flow predictability. The increased cash flow predictability comes not only from the guidance on contract revenue (ASC 606) but also from the guidance on contract costs (ASC 340-40). The effects concentrate among firms conducting long-term sales contracts, especially over longer forecast horizons. Further analysis shows that the new revenue standard also increases the combined information content of financial statements and the capital market efficiency. However, the discretion under the principle-based new standard opens avenue for earnings management when firms face strong manipulation incentives.
Reinforcement learning is a widely used approach to tackle problems in sequential decision making where an agent learns from rewards or penalties. However, in decision-making problems that involve safety or limited resources, the agent's exploration is often limited by constraints. To model such problems, constrained Markov decision processes and constrained decentralized partially observable Markov decision processes have been proposed for single-agent and multi-agent settings, respectively. A significant challenge in solving constrained Dec-POMDP is determining the contribution of each agent to the primary objective and constraint violations. To address this issue, we propose a fictitious play-based method that uses Lagrangian Relaxation to perform credit assignment for both primary objectives and constraints in large-scale multi-agent systems. Another major challenge in solving both CMDP and constrained Dec-POMDP is the sample inefficiency issue, mainly resulting from finding valid actions that satisfy all constraints, which becomes even more difficult in large state and action spaces. Recent works in RL have attempted to incorporate domain knowledge from experts into the learning process through neuro-symbolic methods to address the sample inefficiency issue. We propose a knowledge compilation framework using decision diagrams by treating constraints as domain knowledge and introducing neuro-symbolic methods to support effective learning in constrained RL. Firstly, we propose a zone-based multi-agent pathfinding (ZBPF) framework that is motivated by drone delivery applications. We propose a neuro-symbolic method to efficiently solve the ZBPF problem with several domain constraints, such as simple path constraint and landmark constraint in ZBPF. Secondly, we propose another neuro-symbolic method to solve action constrained RL where the action space is discrete and combinatorial. Empirical results show that our proposed approaches achieve better performance than standard constrained RL algorithms in several real-world applications.
This dissertation investigates the impact of acquisition activities on the current partners of the involved firms and the restructuring of their alliance portfolios. The first essay examines how acquisitions by a firm’s current alliance partners influence this firm’s subsequent alliance formation. The literature suggests that if a focal firm’s current alliance partners acquire targets that are in the same industry as the focal firm, the focal firm would be concerned about these alliance partners’ commitment, their increased bargaining power, and opportunistic behaviors. This essay contends that in response, the focal firm will form alliances with new partners to mitigate concerns about potential reduction in resource capture in its current alliances and to reduce dependence on current partners. This essay also theorizes how the focal firm’s status relative to its partners and its ego network density mitigate this tendency. This essay expands the knowledge about how firms react to alliance partners’ strategic activities. The second essay explores how acquisition premiums influence the acquirers' subsequent alliance formation. It reveals that acquirer firms paying higher acquisition premiums tend to engage in fewer new alliances afterward. However, this tendency diminishes when the relational embeddedness between the acquirer and the target increases, or when the acquirer holds a higher centrality or brokerage position. This essay expands the existing literature on acquisition premiums by shedding light on their influence on acquirers' external interorganizational relationships. The third essay examines the impact of acquisitions on the economic gains experienced by the common partners of both the acquirer and target firms, as evidenced by market reactions to the announcement of the acquisition. The hypothesis posits that acquisitions have a negative effect on the stock market returns of these common partners, attributed to a decrease in bargaining power. Additionally, this essay proposes that factors such as the number of other common partners and previous alliance experiences between the acquirer and target may mitigate this negative impact. This essay enriches the existing literature on acquisitions by providing new insights into implications for third parties and interorganizational relationships.
Given the rapid pace of urbanization, there is a pressing need to optimize urban logistics delivery operations for enhanced capacity and efficiency. Over recent decades, a multitude of optimization approaches have been put forth to address urban logistics challenges, encompassing routing and scheduling within both static and dynamic contexts. In light of the rising computational capabilities and the widespread adoption of machine learning in recent times, there is a growing body of research aimed at elucidating the seamless integration of data and machine learning within conventional urban logistics optimization models. Additionally, the ubiquitous utilization of smartphones and internet innovations presents novel research challenges in the realm of urban logistics, notably in the domains of last-mile delivery collaboration and on-demand food delivery services.
My PhD research is driven by these new demands, exploring how data-driven methods can improve urban logistics. This thesis will encompass a comprehensive discussion of my research conducted in three key domains: (1) collaborative urban delivery with alliances; (2) dynamic service area sizing optimization for on-demand food delivery services; and (3) optimization of dynamic matching time intervals for on-demand food delivery services.
This dissertation consists of three chapters on Search Models of Money.
The first chapter is a review of recent advances in Search Models of Money. It reviews the Lagos and Wright (2005) framework which is the workhorse of many modern search models with applications to models with Competing Media of Exchange to Fiat Currency, and models with Money and Credit. We trace the history of the development of search models of money from the first generation to present day. We highlight recent developments that address puzzles such as the coexistence of money in an environment where an asset serves as both an alternative means-of-payment and a superior store of value. We look at search models of money with credit which address the fact that in the original LW framework, credit could not exist because agents are anonymous in the decentralized market while in the centralized market all agents can work with linear utility in hours rendering credit unnecessary.
The second chapter explores the adoption and acceptance of alternative means-of-payment to fiat currency. We determine the inflation rate and transaction costs of adoption that encourage the adoption of an alternative means-of-payment. However, the buyer’s bargaining power must also be high enough for money and the asset to co-exist as means of payment, otherwise buyers will choose to use money only for low inflation and asset only for high inflation. We observe that when inflation is low, for a given fraction of acceptance of the alternative means-ofpayment by sellers, and the cost of holding money is not great so the benefit of using the asset as an alternative means-of-payment to the buyer is negative or zero, and buyers will not adopt the asset. At high inflation when the asset is adopted and accepted as an alternative means-of-payment, when acceptance rate is low, welfare gains are limited because agents do not use too much of the asset as an alternative means-of-payment. However, when the acceptance rate is high, the welfare gains are much higher. In equilibria where money and the asset co-exist as means of payment, increasing the seller’s acceptance rate of the asset as means-of-payment encourages the adoption of the asset as means-of-payment at lower inflation rates.
The third chapter investigates consumer behaviour in an environment with two types of credit – secured and unsecured credit, and with four types of agents – (1) low-income agents with high consumption needs, (2) high-income agents with high consumption needs, (3) low-income agents with low consumption needs, and (4) high-income agents with low consumption needs. Given each agent has a strictly less than one probability of access to financial markets or credit, this gives rise to a total of eight heterogenous agents. As inflation increases, the cost of money increases resulting in agents carrying less fiat currency and relying more on credit to finance their consumption needs. Low-income agents with high consumption needs are always the first to require credit while in most situations, high-income agents with low consumption needs never need credit. Credit relaxes liquidity constraints of agents and as inflation increases, welfare decreases because agents carry less money and rely on credit to finance consumption needs. At high levels of inflation, agents start to have insufficient liquidity to obtain the optimal DM quantity of good. Calibrating to US data, we find welfare loss range from 1% to 4% for every 0.1% increase in inflation. Because of our diverse types of agents, we are able to show that inflation affects high consumption agents the most, especially those without access to credit.
Recent literature indicates that a lack of personal control negatively predicts (social) cynicism, a negative view of others as self-interested and exploitative (Stavrova & Ehlebracht, 2018a, 2019). Despite the ostensibly robust nature of this relationship, I propose that the strength of the link between personal control and cynicism could be more variable than extant findings have suggested. In particular, I argue that variability in the controlcynicism link may be tracked (i.e., moderated) by the extent to which actors in a situation have corresponding or conflicting interests, with the effect of control on cynicism being attenuated when actors are perceived to have corresponding (vs. conflicting) interests. Furthermore, I reason that perceptions of vulnerability to exploitation should mediate the effect of control (and interests) on cynicism. Overall, the present research hypothesized a moderated mediation model linking personal control, interests, vulnerability, and cynicism. Four studies were conducted: three experiments that employed economic games (Study 1) and vignettes (Study 2 and 3), and one large-scale, cross-cultural correlational study (Study 4). Findings were broadly consistent with the theoretical model: the link between control and cynicism was mediated by perceptions of vulnerability and was attenuated in situations with corresponding (vs. conflicting) interests. The implications and limitations of the current research are discussed. Overall, the findings suggest that shaping people’s perceptions of interests in a situation can be one useful way to help stem the cynicism that arises from a lack of personal control.
Essays on new business models in operations
This dissertation consists of three essays about problems of managing operations with emerging new business models that are broadly related to anti-counterfeiting, car subscription programs, and on-demand ride-hailing services. In the following three chapters, each studies one type of new business model with opportunities and challenges, and builds analytical models to explore the implications on firms' operational decisions.
Chapter 2 studies the emergence of “super fakes,” and investigates the effectiveness of the new anti-counterfeiting measure — converting counterfeiters to authorized suppliers. We employ a game-theoretic model to examine interactions between a brand-name firm with its home supplier, and a counterfeiter who produces high-quality counterfeits and can be potentially converted to an authorized overseas supplier. We demonstrate that it is easier for the brand-name firm to combat counterfeiting through conversion than by driving the counterfeiter out of the market. We examine the impact of this new measure on consumer and social surplus, and find that it may hurt consumer surplus and does not always improve social surplus.
Chapter 3 studies flexible versus dedicated technology choice and capacity investment decision of a two-product manufacturing firm under demand uncertainty in the presence of subscription programs. The key feature of subscription programs is that a proportion of customers that are allocated a particular product later switches to using the other product (if available). We build a two-stage stochastic program to study the optimal technology choice and capacity investment decision, and the subsequent product allocation and reservation for each product. We investigate how the demand correlation and the switching proportion affect the profitability with each technology, and shape the optimal technology choice decision.
Chapter 4 studies an on-demand ride-hailing platform partnering with traditional taxi companies for expanding the supply of drivers, and the government’s regulation problem of access control of taxi drivers to on-demand ride-hailing requests under such emerging partnership. We examine the conditions under which taxi drivers participate in providing both street-hailing and on-demand ride-hailing services. We investigate whether and how the government should make regulatory decisions to maximize social welfare. We find that advocating their partnership by allowing taxi drivers to get ``full access" to the platform may not be optimal and the regulation is needed.
Essays on New Business Models in Operations
This dissertation consists of three chapters about problems of managing operations with emerging new business models that are broadly related to anti-counterfeiting, car subscription programs, and on-demand ride-hailing services. In the three chapters, each studies one type of new business model with opportunities and challenges, and build analytical models to explore the implications on firms' operational decisions.
The first chapter studies the emergence of “super fakes”, and investigates the effectiveness of the new anti-counterfeiting measure — converting counterfeiters to authorized suppliers. We employ a game-theoretic model to examine interactions between a brand-name firm with its home supplier, and a counterfeiter who produces high-quality counterfeits and can be potentially converted to an authorized overseas supplier. We demonstrate that it is easier for the brand-name firm to combat counterfeiting through conversion than by driving the counterfeiter out of the market. We examine the impact of this new measure on consumer and social surplus and find that it may hurt consumer surplus and does not always improve social surplus.
The second chapter studies flexible versus dedicated technology choice and capacity investment decisions of a two-product manufacturing firm under demand uncertainty in the presence of subscription programs. The key feature of subscription programs is that a proportion of customers that are allocated a particular product later switches to using the other product (if available). We build a two-stage stochastic program to study the optimal technology choice and capacity investment decision, and the subsequent product allocation and reservation for each product. We investigate how the demand correlation and the switching proportion affect the profitability of each technology, and shape the optimal technology choice decision.
The third chapter studies an on-demand ride-hailing platform partnering with traditional taxi companies for expanding the supply of drivers, and the government’s regulation problem of access control of taxi drivers to on-demand ride-hailing requests under the such emerging partnership. We examine the conditions under which taxi drivers participate in providing both street-hailing and on-demand ride-hailing services. We investigate whether and how the government should make regulatory decisions to maximize social welfare. We find that advocating the partnership by allowing taxi drivers to get ``full access" to the platform may not be optimal and the regulation is needed.
Is this Behaviour Impressive or Repulsive? The Influence of Our Ecology on Our Social Evaluations
Sexual unrestrictedness, impulsivity, and a short-term orientation—these traits generally carry negative connotations and tend to be frowned upon. However, are they necessarily maladaptive? Evolutionary psychologists map these traits onto a behavioural cluster known as a fast life strategy. While a wide body of work has examined many types of prejudices (e.g., sexism, ageism, racism, classism, attractiveness bias etc.) the literature has yet to examine prejudices against behaviours that lie on the life history strategy continuum. I propose that in our modern world where life is relatively predictable and mortality rates are lower than in ancestral times, there exists a general negative bias towards fast (versus slow) life strategy traits (H1). Further, I expect that this bias would be attenuated by perceptions of ecological harshness (i.e., mortality threats) because a fast strategy offers adaptive value under conditions of threat (H2). I test these hypotheses across several studies (Total N = 1500 participants from the USA). Study 1 assesses affective reactions that people have towards descriptions of a fast (vs slow) life strategy. Study 2 provides a high-powered replication while examining an exploratory mediator, net perceived affordance. Study 3 adopts a full factorial experimental design, manipulating ecology perceptions and the life strategy of the target. The results generally support our hypotheses that people hold unfavourable views toward fast (versus slow) strategy behaviours, but this can be mitigated by ecology perceptions.
Learning dynamic multimodal networks
Capturing and modeling relationship networks consisting of entity nodes and attributes associated with these nodes is an important research topic in network or graph learning. In this dissertation, we focus on modeling an important class of networks present in many real-world domains. These networks involve i) attributes from multiple modalities, also known as multimodal attributes; ii) multimodal attributes that are not static but time-series information, i.e., dynamic multimodal attributes, and iii) relationships that evolve across time, i.e., dynamic networks. We refer to such networks as dynamic multimodal networks in this dissertation.
An example of a static multimodal network is one that consists of user interface (UI) design objects (e.g., UI element nodes, UI screen nodes, and element image nodes) as nodes, and links between these design objects as edges. For example, the links between UI screen nodes and their constituent UI element nodes are part of the edges between the respective nodes. The design objects may be associated with visual and element images, text, numerical values, and categorical labels as attributes. An example of dynamic company networks with dynamic multimodal attributes may involve relationships between company nodes that evolve across time (i.e., evolving commercial relationships between company nodes), and the company nodes may be associated with time-series of numerical stock prices, textual news, and categorical event attributes.
While there has been significant progress in the area of network or graph learning, most existing works do not focus on modeling such dynamic multimodal networks nor static networks with static or dynamic multimodal attributes.
In the first part of this dissertation, we focus on modeling networks with multimodal attributes. We develop four models that jointly capture static networks comprising different node and/or edge types with static multimodal and positional information. For model interpretability, we propose attention weight-based and learnable edge mask-based methods that enable end-users to understand and interpret the contribution of different parts of the network and information from different modalities. We show that our proposed models consistently out-perform other state-of-the-art models on six datasets across an extensive set of UI prediction tasks.
Next, in the second part of the dissertation, we focus on networks with dynamic multimodal attributes. We propose two models that jointly capture static networks comprising the same or different node types with dynamic attributes, i.e., time-series attributes, from different modalities, e.g., numerical stock price-related and textual news information, which may be local in nature (directly associated with specific nodes), or global in nature (relevant to multiple nodes). To address the noise inherent in multimodal time-series, we also propose knowledge-enrichment and curriculum learning methods. We show that our proposed models out-perform state-of-the-art network learning and time-series models on eight datasets across an extensive set of investment and risk management tasks and applications.
In the third and final part of the dissertation, we focus on modeling dynamic networks with dynamic multimodal attributes. We propose three models that capture dynamic implicit networks and/or dynamic explicit networks. The network nodes may be associated with local or global dynamic multimodal attributes that may be of varying lengths and frequencies. To address noisy and non-stationary dynamic networks and dynamic multimodal attributes, we also propose self-supervised learning and concept learning methods. Aside from applying the proposed models for dynamic networks with dynamic multimodal attributes to investment and risk management tasks and applications on another four datasets, we further apply our proposed models for dynamic networks with dynamic multimodal attributes to environmental, social, and governance rating forecasting tasks on six datasets, and demonstrate that our proposed models out-perform state-of-the-art models on these tasks.
Essays on culture, institutions, and development
The interactive effects of societal and organizational cultural tightness on employee work related outcomes
There are many information retrieval tasks that depend on knowledge graphs to return contextually relevant result of the query. We call them Knowledgeenriched Contextual Information Retrieval (KCIR) tasks and these tasks come in many different forms including query-based document retrieval, query answering and others. These KCIR tasks often require the input query to contextualized by additional facts from a knowledge graph, and using the context representation to perform document or knowledge graph retrieval and prediction. In this dissertation, we present a meta-framework that identifies Contextual Representation Learning (CRL) and Contextual Information Retrieval (CIR) to be the two key components in KCIR tasks.
We then address three research tasks related to the two KCIR components. In the first research task, we propose a VAE-based contextual representation learning method using a co-embedding attributed network structure that co-embeds knowledge and query context in the same vector space. The model shows superior downstream prediction accuracy compared to other baseline models using VAE with or without using external knowledge graph.
Next, we address the research task of solving a novel IR problem known as Contextual Path Retrieval (CPR). In this task, a knowledge graph path relevant to a given query and a pair of head and tail entities is to be retrieved from the background knowledge graph. We develop a transformer-based model consisting of context encoder and path encoder to solve the CPR task. Our proposed models which include the proposed two encoders show promising ability to retrieve contextual paths.
Finally, we address the Contextual Path Generation (CPG) task which issimilar to CPR except that the knowledge graph path to be returned may require inferred relation edges since most knowledge graphs are incomplete in their coverage. For the CPG task, we propose both monotonic and non-monotonic approaches to generate contextual paths. Our experiment results demonstrate that the non-monotonic approach yields better-quality resultant knowledge graph paths.
Essays on corporate social (ir)responsiblity, alliance formation and stock market reaction
Customers' waiting experiences are crucial in service and retail systems, and this thesis investigates their impact in various contexts. In the service system, long waiting time causes customers' no-show behavior, and negative feedback from existing customers, which in turn results in low conversion and loss of revenue for service providers. However, waiting is not always a negative presence. In the online retail system, with the innovation of the sales model, long waiting can earn more time for online retailers to ease the logistics pressure although it may reduce customers' willingness to pay at the same time. Against this backdrop, in the first essay, we investigate the influence of customers' waiting preference and no-show behavior on appointment systems in the service system. Secondly, this dissertation looks at the pricing incentivization of customers' waiting in online retail systems. Finally, we empirically measure the impact of financial incentives on last-mile operations to reduce customers' expected waiting time for delivery.
In the first essay, we conduct two lab experiments and build models to examine the impact of waiting on customer's appointment selection and no-show behavior in appointment systems. Appointment systems are widely adopted in many service organizations. The simplest and most common format is the Equally-Spaced (ES) system, in which the inter-appointment times between consecutive arrivals are equal. One major drawback of such a system is the long expected waiting time for later arrivals, which makes later appointment positions unappealing to customers. As a result, customers who take these positions are more likely to abandon their appointments, leading to a higher no-show rate. To address this issue, we examine a novel Equal-Waiting (EW) scheduling system under which the expected waiting times are equal across appointments. Through a series of controlled lab experiments, we establish that the EW system increases the attractiveness of later appointments and that customers who are willing to take these appointments are more likely to show up. We then incorporate this individual-level preference and no-show behavior into models to evaluate its impact on the system-level performance. We find that, compared with the traditional ES system, the EW system can significantly increase customers' show-up rate and improve system utilization.
In the second essay, we focus on the pricing incentivization of customers' waiting in a new flash-sale model, which is widely used by many platforms such as JD.com, and Lazada on seasonal promotions like Double 11. In flash sales, customers first pay the deposit and then wait several days to make the final payment. The product will be shipped out after the final payment is made. The deposit determines the discount strength that the customer can enjoy due to the Double Deposit Inflation and provides a signal to the retailer on potential demands, allowing the retailer to reduce the logistical cost incurred from bottlenecked demand surges. The waiting occurring during the transaction process may reduce customers' willingness to pay. However, it can earn more time for the online retailer to ease the logistics pressure so that the logistics cost can be further reduced. Considering these important features in flash sales, we propose a pricing optimization model and jointly decide the optimal deposit and the product’s full price. We identify the value of introducing the flash-sale channel for the retailer and the conditions under which the value can be realized. We also provide the optimal flash-sale duration. In addition, our findings indicate the importance of considering the production cost in the optimal pricing strategy, especially for the linear demand function. In the case study, we calibrated our model with real data from an e-commerce company in China, and the results from a 5-fold cross-validation show that our model can predict demand well. Besides, by applying the pricing strategy proposed in this paper, we can dramatically improve the profit.
The third essay delves into the impact of financial incentives on last-mile operations. Riders' responsiveness is crucial for service quality in last-mile delivery. To address the frequently-occurred low responsiveness due to driver shortage or order congestion, most delivery platforms adopt financial incentives to attract more drivers. However, empirical research on the effectiveness of financial incentives and their spillover effects is lacking. Thus, the third essay examines the impact of financial incentives on last-mile operations, using transactional datasets obtained from a crowdsourced delivery platform.
Specifically, we employ a regression discontinuity design to identify the causal influence of financial incentives on drivers' order acceptance speed. Our results show that financial incentives significantly reduce the driver's order acceptance duration by 16.6%. Furthermore, temporal effects suggest the platforms can strategically terminate financial incentives ahead of schedule, as the impact will persist for a certain period of time. From a network perspective, we also examine the spillover impact of the neighboring stores' financial incentives on the performance of the focal store. Interestingly, our findings reveal opposing impacts that depend on the focal store's status. Specifically, the nearest store's financial incentives cause longer driver's order acceptance duration at the focal store without financial incentives; however, the opposite spillover effect is observed when the focal store also offers financial incentives. To better understand the underlying mechanisms, we identify the siphon effect and clustering effect as the key drivers of this phenomenon. This study contributes both theoretical and practical implications to the field of last-mile delivery.
Existing research on multiracials has examined how multiracials develop different racial identities. However, empirical research on how multiracial manage and integrate their identities as well as its impact are limited. This dissertation examined key antecedents and consequences associated with the unique process that multiracials undergo to achieve a positive identity via Multiracial Identity Integration (MII). In Study 1, we examined the link between MII, psychological well-being, and cognitive capacity. Results revealed a positive association between MII and psychological well-being as well as some cognitive capacity outcomes. Study 2 replicated the same relationship between MII and psychological well-being/cognitive capacity outcomes. Additionally, multiracials’ experiences with identity denial and identity inquiry were negatively associated with multiracials’ MII. The relationship between identity denial and psychological well-being/cognitive capacity outcomes were mediated by MII. Studies 3 and 4 examined if MII would moderate the interpretation of identity-related questions and if manipulated experiences of identity denial and identity inquiry would impact multiracials’ MII respectively. The findings from both studies were nonsignificant. Together, this dissertation illuminated the antecedents and consequences associated with a healthy multiracial identity via MII. Theoretical and practical implications are discussed.
Most traditional machine learning or deep learning methods are based on the premise that training data and test data are independent and identical distributed, i.e., IID. However, it is just an ideal situation. In real-world applications, test set and training data often follow different distributions, which we refer to as the out of distribution, i.e., OOD, setting. As a result, models trained with traditional methods always suffer from an undesirable performance drop on the OOD test set. It's necessary to develop techniques to solve this problem for real applications. In this dissertation, we present four pieces of works in the direction of OOD in Natural Language Processing (NLP) which can be further grouped into two sub-categories: adversarial robustness and cross-lingual transfer.
Social Attention in Realistic Work Environments
Social attention – the process by which individuals select which aspect of the social world to mentally process – is a key antecedent to all organisational behaviour in groups. This central role of attention has long been appreciated by organisational theorists, but our understanding of this core cognitive process has been hampered by a lack of empirical evidence. To create a method through which organisational scholars can study social attention, this dissertation combines cognitive science measures of attention with recent innovations from social and applied psychology using virtual reality to study naturalistic social behaviour (Chapter 1). This method is then applied to investigate the factors that determine whether individuals can capture the attention of their audience at work – e.g., charismatic job candidates receiving more attention than non-charismatic job candidates – and the downstream effects this has on individual-level outcomes (Chapter 2). These biases in social attention are then incorporated into models of group decision-making to demonstrate how micro-level attentional biases in group decision-making scenarios can translate into macro-level decision biases and thus sub-optimal decision outcomes (Chapter 3). The dissertation concludes with an inductive theory of “Socially Bounded Rationality” that hopes to spur future research on this topic.
Regulating by new technology: The impacts of the SEC data analytics on the SEC investigations
Regulating by New Technology: The Impacts of the SEC Data Analytics on the SEC Investigations
While the use of data analytics has been increasingly emphasized by the Securities and Exchange Commission (SEC) in recent years, there is little research about whether the investment in data analytics achieves its goal of improving enforcement efficiency. This study examines the effects of the SEC regional offices’ use of data analytics on their investigation outcomes. Consistent with data analytics helping the SEC identify the more suspicious cases to investigate and facilitate the formal investigation process, I find that the SEC’s use of data analytics is associated with a 12% increase in the SEC’s investigation success rate. Such an improvement is larger for more complex firms and for firms that are geographically distant from the SEC regional offices. In addition, I find that firms are less likely to commit fraud after the SEC’s use of data analytics because of the higher perceived detection likelihood. Additional tests suggest that the investigation time is shorter, and the detected fraud is more severe after the SEC’s use of data analytics. Overall, the results provide evidence that the SEC’s use of data analytics increases its enforcement efficiency and reduces firms’ incentives to commit fraud.
Recommendation explanations help to make sense of recommendations, increasing the likelihood of adoption. Here, we are interested in mining product textual data, an unstructured data type, coming from manufacturers, sellers, or consumers, appearing in many places including title, summary, description, review, question and answers, etc., can be a rich source of information to explain the recommendation. As the explanation task could be decoupled from that of recommendation objective, we can categorize recommendation explanation into integrated approach, that uses a single interpretable model to produce both recommendation and explanation, or pipeline approach, that uses a post-hoc explanation model to produce explanation for recommendation from a black-box or an explainable recommendation model. In addition, we can also view the recommendation explanation as evaluative, assessing the quality of a single product, or comparative, comparing the quality of a product to another product or to multiple products. In this dissertation, we present research works on both integrated and pipeline approaches for recommendation explanations as well as both evaluative and comparative recommendation explanations.
Reinforcement Learning Approach to Coordinate Real-World Multi-Agent Dynamic Routing and Scheduling
In this dissertation, we study new variants of routing and scheduling problems motivated by real-world problems from the urban logistics and law enforcement domains. In particular, we focus on two key aspects: dynamic and multi-agent. While routing problems such as the Vehicle Routing Problem (VRP) is well-studied in the Operations Research (OR) community, we know that in real-world route planning today, initially-planned route plans and schedules may be disrupted by dynamically-occurring events. In addition, routing and scheduling plans cannot be done in silos due to the presence of other agents which may be independent and self-interested.
This dissertation discusses and proposes new methodologies that incorporate relevant techniques from the field of AI (Reinforcement Learning (RL) and Multi-Agent System (MAS) more precisely) to supplement and complement classical OR techniques to solve dynamic and multi-agent variants of routing and scheduling problems. This dissertation makes three main contributions. Firstly, to address dynamic aspect of routing and scheduling problem, we propose an RL-based approach that combines Value Function Approximation (VFA) and planning heuristic to learn assignment/dispatch and rerouting/rescheduling policies jointly without the need to decompose the problem or action into multiple stages. Secondly, to address multi-agent aspect of routing and scheduling problem, we formulate the problem as strategic game and propose a scalable, decentralized, coordinated planning approach based on iterative best response. Lastly, to address both dynamic and multi-agent aspects of the problem, we present a pioneering effort on a cooperative Multi-Agent RL (MARL) approach to solve multi-agent dynamic routing and scheduling problem directly without any decomposition step. This contribution builds upon our two earlier contributions by extending the proposed VFA method to address multi-agent setting and incorporating the iterative best response procedure as a decentralized optimization heuristic and an explicit coordination mechanism.
Despite previous research demonstrating the negative effects of IADL limitations on depressive symptoms in older adults, there is a dearth of research examining the underlying mechanism of this relationship. Drawing on the stress process model, the present research investigated whether purpose in life and resilience, as psychological resources, would mediate the relation between IADL limitations and depressive symptoms. We recruited 111 community-dwelling older adults (ages 54-85) and examined a parallel mediation model using the PROCESS macro. Our results revealed that purpose in life and resilience fully mediated the relation between IADL limitations and depressive symptoms. These mediation effects held true when we adjusted for notable covariates. Our findings underscore the crucial roles of purpose in life and resilience in explaining the relation between IADL limitations and depressive symptoms in older adults. Implications and opportunities for intervention programs to bolster purpose in life and resilience are discussed.
M&A advisor banks are privy to valuable and sensitive information through their service. I examine whether M&A advisor banks exploit such private information to trade in peers of M&A firms. I provide evidence that M&A advisor banks gain higher profits through their trading in peers of M&A firms, compared with non-advisor banks. Such informed trading is more intensive for M&A deals with larger impacts on peer firms (i.e., when the deal value is more significant for peer firms; when the M&A firms have larger market share in the industry; and when the stock price reactions of peer firms are stronger). Further analysis reveals that prior business relationships with peer firms enable M&A advisor banks to engage in such informed trading. In addition, M&A advisor banks’ performance pressure incentivizes them to utilize private M&A information for trading, while reputation concern deters such informed trading in peer firms.
CSR is receiving significant attention from both academics and businesses. However, CSR and economic goals are still perceived as conflicting. To address this gap, this dissertation put together three essays that to inform about the instrumental value of CSR and how CSR could be compatible with economic objectives.
Chapter Two reveals CSR reporting has strategic value for firms by improving firms' relationships with stakeholders and facilitating the development of sustainable development capabilities. Firms are responsive to shareholders' and governments' demands for reporting. As society increasingly demands for CSR reporting, the overlooked issue of costs and benefits of reporting is costly and counterproductive to sustainable development goals. I propose a collaborative, parsimonious and fine-grained regulatory approach, focusing on issue saliency and issue-specific regulations to address the gap.
Chapter Three explores the interplay between financial performance and CSR in business practices. Drawing on behavioral economics, it investigates how relational rationality, emphasizing stakeholder relationship preservation, and economic efficient rationality, prioritizing resource conservation, influence CSR strategizing. The paper highlights the need for CSR strategizing to facilitate resource access for profitability-related projects and complements overall firms’ strategy. The essay conceptualizes resource conservation and relationship preservation mechanisms, providing insights into how firms allocate resources to economic and social objectives.
Chapter Four examines the contingencies of the CSR-performance relationship. Integrating trust research with signaling theory, this paper proposes that perceived trustworthiness influences the credibility of CSR signals. Specifically, this study examines how the propensity to trust and category-based trust moderate the association between CSR and firm performance. This paper address an overlooked gap that stakeholders may perceive firms' CSR differently across countries, therefore influencing the CSR-performance relationship and informing businesses to invest in trust-building strategies.
In this thesis, we develop novel nonparametric estimation techniques for two distinct classes of models: (1) Generalized Additive Models with Unknown Link Functions (GAMULF) and (2) Generalized Panel Data Transformation Models with Fixed Effects. Both models avoid parametric assumptions on their respective link or transformation functions, as well as the distribution of the idiosyncratic error terms.
The first chapter aims to provide an in-depth and systematic introduction to cross- sectional and panel-data nonparametric transformation models, encompassing practical applications, a diverse range of estimation techniques, and the study of asymptotic properties. We discuss the advantages and limitations of these models and estimation methods, delving into the latest advancements and innovations in the field. Furthermore, we propose a potential approach to mitigate the curse of dimensionality in the context of fully nonparametric transformation models with fixed effects in panel-data settings.
The second chapter proposes a three-stage nonparametric least squares (NPLS) estimation procedure for the additive functions in the GAMULF. In the first stage, we estimate conditional expectation by the local-linear kernel regression and then apply matching method to the splines series to obtain initial estimators. In the second stage, we use the local-polynomial kernel regression to estimate the link function. In the third stage, given the estimators in Stages 1 and 2, we apply the local-linear kernel regression to refine the initial estimator. The great advantage of such a procedure is that the estimators obtained at all stages have closed-form expressions, which overcomes the computational hurdle for existing estimators of the GAMULF model.
The third chapter proposes a multiple-stage Local Maximum Likelihood Estimator (LMLE)
for the structural functions in the generalized panel data transformation model with fixed effects. In the first stage, we apply the regularized logistic sieve method to estimate the sieve coefficients associated with the approximation of a composite function and then apply a matching method to obtain initial consistent estimators of the additive structural functions. In the second stage, we apply the local polynomial method to estimate certain composite function and its derivatives to be used later on. In the third stage, we apply the local linear method to obtain the refined estimator of the additive structural functions based on the estimators obtained in Steps 1 and 2. The greatest advantage is that all minimization problems are convex and thus overcome the computational hurdle for existing approaches to the generalized panel data transformation model.
The final estimates of the additive terms in two models achieve the optimal one-dimensional convergence rate, asymptotic normality and oracle efficiency. The Monte Carlo simulations demonstrate that our new estimator performs well in finite samples.
The thesis demonstrates the effectiveness of the proposed nonparametric estimation techniques in addressing the complexities of generalized additive models with unknown link functions and panel data transformation models with fixed effects.
This dissertation consists of two papers that contribute to the estimation and inference theory of the panel data models with two-way slope heterogeneity. The first chapter considers the panel quantile regression model with slope heterogeneity along both individuals and time. By modelling this two-way heterogeneity with the low-rank slope matrix, the slope coefficient can be estimated via the nuclear norm regularization followed by sample-splitting, row- and column-wise quantile regression, and debiasing. The inferential theory for the final slope estimator along with its factor and factor loading is derived. Two specification tests are proposed: one tests whether the slope coefficient is a constant over one dimension (individual or time) without assuming the slope coefficient is homogeneous over the other dimension under the case that the true rank of the slope matrix equals one, and the other tests whether the slope coefficient follows the additive structure under the case that the true rank of slope matrix equals two. The second paper focuses on the estimation and inference of the linear panel model with interactive fixed effects and two-way slope heterogeneity. Specifically, individual coefficients are allowed to form by a latent group structure cross-sectionally, and such a structure can change after an unknown structural break. A multi-stage estimation algorithm is proposed, which involves nuclear norm regularization, break detection, and a K-means procedure, to estimate the break date, the number of groups, and the group structure. Under some regularity conditions, the break date estimator, number of groups estimator, and the group structure estimator can be shown to enjoy the oracle property. Monte Carlo studies and empirical applications are conducted to illustrate the finite sample performance of the proposed algorithms and estimators.
Graph-structured data are ubiquitous across numerous real-world contexts, encompassing social networks, commercial graphs, bibliographic networks, and biological systems. Delving into the analysis of these graphs can yield significant understanding pertaining to their corresponding application fields.Graph representation learning offers a potent solution to graph analytics challenges by transforming a graph into a low-dimensional space while preserving its information to the greatest extent possible. This conversion into low-dimensional vectors enables the efficient computation of subsequent graph algorithms. The majority of prior research has concentrated on deriving node representations from a single, static graph. However, numerous real-world situations demand rapid generation of representations for previously unencountered nodes, novel edges, or entirely new graphs. This inductive capability is vital for high-performance machine learning systems that operate on ever-changing graphs and consistently encounter unfamiliar nodes. The inductive graph representation presents considerable difficulty when compared to the transductive setting, as it necessitates the alignment of new subgraphs containing previously unseen nodes with an already trained neural network. We further investigate inductive graph representation learning through three distinct angles: (1) Generalizing Graph Neural Networks (GNNs) across graphs, addressing semi-supervised node classification across multiple graphs; (2) Generalizing GNNs across time, focusing on temporal link prediction; and (3) Generalizing GNNs across tasks, tackling various low-resource text classification tasks.
In the current age, rapid growth in sectors like finance, transportation etc., involve fast digitization of industrial processes. This creates a huge opportunity for next-generation artificial intelligence system with multiple agents operating at scale. Multiagent reinforcement learning (MARL) is the field of study that addresses problems in the multiagent systems. In this thesis, we develop and evaluate novel MARL methodologies that address the challenges in large scale multiagent system with cooperative setting. One of the key challenge in cooperative MARL is the problem of credit assignment. Many of the previous approaches to the problem relies on agent's individual trajectory which makes scalability limited to small number of agents. Our proposed methodologies are solely based on aggregate information which provides the benefit of high scalability. The dimension of key statistics does not change with increasing agent population size. In this thesis we also address other challenges that arise in MARL such as variable duration action, and also some preliminary work on credit assignment with sparse reward model.
The first part of this thesis investigates the challenges in a maritime traffic management (MTM) problem, one of the motivating domains for large scale cooperative multiagent systems. The key research question is how to coordinate vessels in a heavily trafficked maritime traffic environment to increase the safety of navigation by reducing traffic congestions. MTM problem is an instance of cooperative MARL with shared reward. Vessels share the same penalty cost for any congestions. Thus, it suffer from the credit assignment problem. We address it by developing a vessel-based value function using aggregate information, which performs effective credit assignment by computing the effectiveness of the agent’s policy by filtering out the contributions from other agents. Although this first approach achieved promising results, its ability to handle variable duration action is rather limited, which is a crucial feature of the problem domain. Thus, we address this challenge using hierarchical reinforcement learning, a framework for control with variable duration action. We develop a novel hierarchical learning based approach for the maritime traffic control problem. We introduce a notion of meta action a high level action that takes variable amount time to execute. We also propose an individual meta value function using aggregate information which effectively address the credit assignment problem.
We also develop a general approach to address the credit assignment problem for a large scale cooperative multiagent system for both discrete and continuous actions settings. We extended a shaped reward approach known as difference rewards (DR) to address the credit assignment problem. DRs are an effective tool to tackle this problem, but their computation is known to be challenging even for small number of agents. We propose a scalable method to compute difference rewards based on the aggregate information. One limitation of this DR based approach for credit assignment is that it relies on learning a good approximation of reward model. But, in a sparse reward setting agents do not receive any informative immediate reward signal until the episode ends, so this shaped reward based approach is not effective in sparse reward case. In this thesis, we also propose some preliminary work in this direction.
Information Acquisition and Market Friction
My dissertation consists of three papers related to information diversity, acquisition, and asymmetry. One part of the dissertation explores the implications of interactions among different market participants and subsequent price efficiency in the stock market. The empirical findings indicate the information diversity between individuals and institutional investors, as well as an important channel for retail investors to obtain useful information – through insider filings. The remaining part investigates the information asymmetry between issuers and naive investors in the cryptocurrency market. In Chapter 2, I aggregate trading signals from hedge funds and retail investors, in order to examine their information diversity and the combined informational role in the stock market. I show that incorporating signals from both groups is necessary to identify firm-level information. Stocks that reflect consistent trading between two groups exhibit strong return predictability without reversal. When trading in the opposite direction to retail investors, hedge funds cannot yield any significant return, even in a longer horizon. I also document that consistent trading between two groups significantly predicts firm fundamentals, informational events, market reactions, and helps alleviate stock-level mispricing. Overall, the findings suggest combining signals that solely from hedge funds is incomplete, as there remain signals from retail investors who are informed in different aspects of stock fundamentals. In Chapter 3, we examine the trading patterns of retail investors following insider trading and the corresponding price impact. Retail investors follow the opportunistic purchases by insiders, but not their routine purchases. The abnormal retail downloads of the Form 4 filings from the EDGAR database also increase for opportunistic insider purchases. Neither investor attention nor common information such as earnings announcements or analysts forecast revisions explains the results. Moreover, for stocks with opportunistic insider purchases, those that retail investors bought yield higher cumulative abnormal returns than those that retail investors sold. The effect is mostly driven by the information component of the retail trades, rather than liquidity provision or temporary price pressure. Variance ratio tests also suggest price efficiency improvements for stocks bought by retail investors following opportunistic insider purchases. The evidence is mostly consistent with retail investors learning from opportunistic insider purchases, and their trading helping expedite price discovery. In Chapter 4, we study the economics of financial scams by investigate the market for initial coin offerings (ICOs) using point-in-time data snapshots of 5,935 ICOs. Our evidence indicates that ICO issuers strategically screen for na¨ıve investors by misrepresenting the characteristics of their offerings across listing websites. Misrepresented ICOs have higher scam risk, and misrepresentations are unlikely to reflect unintentional mistakes. Using on-chain analysis of Ethereum wallets, we find that less sophisticated investors are more likely to invest in misrepresented ICOs. We estimate that 40% of ICOs (U.S. $12 billion) in our sample are scams. Overall, our findings uncover how screening strategies are used in financial scams and reinforce the importance of conducting due diligence.
Nowadays, software question and answer (SQA) data has become a treasure for software engineering as it contains a huge volume of programming knowledge. That knowledge can be interpreted in many different ways to support various software activities, such as code recommendation, program repair, and so on. In this dissertation, we interpret SQA data by addressing three novel research problems.
The first research problem is about linkable knowledge unit prediction. In this problem, a question and its answers within a post in Stack Overflow are considered as a knowledge unit (KU). KUs often contain semantically relevant knowledge, and thus linkable for different purposes. Being able to classify different classes of linkable knowledge units would support more targeted information needs when users search or explore the linkable knowledge. Compare with the approaches proposed in prior works, we design a relatively simpler but more effective machine learning model to address the problem. Moreover, we discover the limitation of the dataset used in the previous works and construct a new one with a larger size and higher diversity. Our experimental result shows that our model outperforms the state-of-the-art approaches significantly.
The second research problem is about distributed representation for Stack Overflow posts. In this dissertation, we propose a specialized deep learning architecture Post2Vec which extracts distributed representations of Stack Overflow posts. To evaluate Post2Vec, we first investigate its end-to-end effectiveness in tag recommendation task. We observe that Post2Vec achieves significant improvement in terms of F1-score@5 at a lower computational cost. Moreover, to evaluate the value of representations learned by Post2Vec, we use them for three other tasks, i.e., relatedness prediction, post classification, and API recommendation. We demonstrate that the representations can be used to boost the effectiveness of state-of-the-art solutions for the three tasks by substantial margins.
The third research problem is about answer summary generation for technical questions. We formulate the task as a query-focused multi-answer-posts summarization task for a given technical question. We conduct user studies to evaluate the quality of the answer summaries generated by our approach. The user study results demonstrate those answer summaries generated by AnswerBot are relevant, useful, and diverse.
The code hosting platform GitHub has gained immense popularity worldwide in recent years, with over 200 million repositories hosted as of June 2021. Due to its popularity, it has great potential to facilitate widespread improvements across many software projects. Naturally, GitHub has attracted much research attention, and the source code in the various repositories it hosts also provide opportunity to apply techniques and tools developed by software engineering researchers over the years. However, much of existing body of research applicable to GitHub focuses on code quality of the software projects and ways to improve them. Fewer work focus on potential ways to improve quality of GitHub repositories through other aspects, although quality of a software project on GitHub is also affected by factors outside a project's source code, such as documentation, the project's dependencies, and pool of contributors.
The three works that form this dissertation focus on investigating aspects of GitHub repositories beyond the code quality, and identify specific potential improvements that can be applied to improve wide range of GitHub repositories. In the first work, we aim to systematically understand the content of README files in GitHub software projects, and develop a tool that can process them automatically. The work begins with a qualitative study involving 4,226 README file sections from 393 randomly-sampled GitHub repositories, which reveals that many README files contain the ``What'' and ``How'' of the software project, but often do not contain the purpose and status of the project. This is followed by a development and evaluation of a multi-label classifier that can predict eight different README content categories with F1 of 0.746. From our subsequent evaluation of the classifier, which involve twenty software professionals, we find that adding labels generated by the classifier to README files ease information discovery.
Our second work focuses on characteristics of vulnerabilities in open-source libraries used by 450 software projects on GitHub that are written in Java, Python, and Ruby. Using an industrial software composition analysis tool, we scanned every version of the projects after each commit made between November 1, 2017 and October 31, 2018. Our subsequent analyses on the discovered library names, versions, and associated vulnerabilities reveal, among others, that ``Denial of Service'' and ``Information Disclosure'' vulnerability types are common. In addition, we also find that most of the vulnerabilities persist throughout the observation period, and that attributes such as project size, project popularity, and experience level of commit authors do not translate to better or worse handling of vulnerabilities in dependent libraries. Based on the findings in the second work, we list a number of implications for library users, library developers, as well as researchers, and provide several concrete recommendations. This includes recommendations to simplify projects' dependency sets, as well as to encourage research into ways to automatically recommend libraries known to be secure to developers.
In our third work, we conduct a multi-region geographical analysis of gender inclusion on GitHub. We use a mixed-methods approach involving a quantitative analysis of commit authors of 21,456 project repositories, followed by a survey that is strategically targeted to developers in various regions worldwide and a qualitative analysis of the survey responses. Among other findings, we discover differences in diversity levels between regions, with Asia and Americas being highest. We also find no strong correlation between gender and geographic diversity of a repository's commit authors. Further, from our survey respondents worldwide, we also identify barriers and motivations to contribute to open-source software. The results of this work provides insights on the current state of gender diversity in open source software and potential ways to improve participation of developers from under-represented regions and gender, and subsequently improve the open-source software community in general. Such potential ways include creation of codes of conduct, proximity-based mentorship schemes, and highlighting of women / regional role models.
In recent years, we have witnessed significant progress in building systems with artificial intelligence. However, despite advancements in machine learning and deep learning, we are still far from achieving autonomous agents that can perceive multi-dimensional information from the surrounding world and converse with humans in natural language. Towards this goal, this thesis is dedicated to building intelligent systems in the task of video-grounded dialogues. Specifically, in a video-grounded dialogue, a system is required to hold a multi-turn conversation with humans about the content of a video. Given an input video, a dialogue history, and a question about the video, the system has to understand contextual information of dialogue, extract relevant information from the video, and construct a dialogue response that is both contextually relevant and video-grounded. Compared to related research domains in computer vision and natural language processing, the video-grounded dialogue task raises challenging requirements, including: (1) language reasoning in multiple turns: the ability to understand contextual information from dialogues, which often consist of linguistic dependencies from turn to turn; (2) visual reasoning in spatio-temporal space: the ability to extract information from videos, which contain both spatial and temporal variations that characterize object appearance and actions; and (3) language generation: the ability to acquire natural language and generate responses with both contextually relevant and video-grounded information. Towards building an intelligent system for the video-grounded dialogue task, we introduced a neural model, Multimodal Transformer Network (MTN), that can be trained in an end-to-end manner to reason over both dialogue and video inputs and decode a natural language response. The architecture was tested against the established benchmark Audio-Visual Scene-Aware Dialogue (AVSD) and achieved superior performance from other neural-based systems. Despite this success, we found that MTN is not specifically designed for scenarios that require sophisticated visual or language reasoning. To further improve the reasoning capability of models in visual aspects, we introduced BiST, a Bidirectional Spatio-Temporal Reasoning approach that can extract relevant visual cues in videos in both spatial and temporal dimensions. This approach achieved consistent performance in both quantitative and qualitative results. However, our findings show that in many scenarios, systems failed to learn the contextual information of dialogue, which may lead to incorrect or incoherent system responses. To address this limitation, we focused our attention on the language reasoning capability of models. We proposed PDC, a path-based reasoning approach for dialogue context. PDC requires systems to learn to extract a traversal path among dialogue turns in the dialogue context. Our findings demonstrate the performance gains of this approach as compared to sequential or graph-based learning approaches. To combine both visual and language reasoning, we adopted compositionality to encode questions as a sequential reasoning program. The program is parameterized by entities and actions which are used to extract more refined features from video inputs. We denoted this approach as Video-grounded Neural Module Network (VGNMN). From experiments with VGNMN, we found not only potential performance gains in automatic metrics but also improved interpretability through learned reasoning programs. In video-grounded dialogue research, we found a major obstacle that hindered our progress: limitation of data. While there are very limited video-grounded dialogue data available, developing a new benchmark involves costly and time-consuming manual annotation efforts. The data limitation essentially prevents a system from acquiring sufficient natural language understanding. We then proposed to make use of pretrained language models such as GPT, to leverage their linguistic dependencies learned from large-scale text data. In another work, we adopted causality to augment current data with counterfactual samples that support model training. Our findings show that both pretrained systems and data augmentation are effective strategies to alleviate the data limitation. To facilitate further research in this field, we developed DVD, a Diagnostic Video-grounded Dialogue benchmark. We built DVD as a diagnostic and synthetic benchmark to fairly evaluate systems by visual and textual complexity. We tested several baselines, from simple heuristic models to complex neural networks, and found that all models are inefficient in different aspects, from multi-turn textual references to visual object tracking. Our findings suggest that current approaches still perform poorly in DVD and future approaches should be integrated with multistep and multi-modal reasoning capabilities. In view of the above findings, we developed a new sub-task within video-grounded dialogue systems. We introduced Multimodal Dialogue State Tracking (MM-DST) task, which requires a system to maintain a recurring memory or state of all visual objects that are mentioned in dialogue context. At each dialogue turn, dialogue utterances may introduce new visual objects or new object attributes, and a dialogue system are required to update the states of these objects. We leveraged techniques from the research of task-oriented dialogues, introduced a new baseline, and discussed our findings. Finally, we concluded the dissertation with a summary of our contributions and a discussion of potential future directions in video-grounded dialogue research.
Extant research has demonstrated robust positive relations between positive affect (PA) and meaning, although the strength of this relationship has been found to vary as a function of both chronological age and time horizon (Hicks et al., 2012). This can be explained by the Socioemotional Selectivity Theory (SST), which posits that both older adults and those with a limited time horizon (i.e., perceive less remaining in life) tend to focus on emotional goals over knowledge goals. In the current paper, I sought to extend SST’s findings to the level of activities by examining how chronological age, time horizon (both existing and manipulated), and one’s focus on emotional/knowledge goals influenced the strength of the relationship between the enjoyableness and meaningfulness of specific activities. These hypotheses were tested using an older (Study 1) and a younger adult sample (Study 2). Although none of the hypothesized relations were fully supported, interesting relations were uncovered through exploratory analyses that examined specific activities in terms of their experiential qualities and the joint effects of both positive (PA) and negative affect (NA) on activity-related meaning perceptions. In older adults, I found that for those with a limited time horizon, high-PA activities were less meaningful when also accompanied by NA. In contrast, for those with an expansive time horizon, high-PA activities remained meaningful even when accompanied by NA. In younger adults, I found that those who prioritized emotional goals experienced less meaning from uniformly negative activities compared to those who prioritized knowledge goals. Theoretical and practical implications of the current study are discussed.
Online reviews are prevalent in many modern Web applications, such as e-commerce, crowd-sourced location and check-in platforms. Fueled by the rise of mobile phones that are often the only cameras on hand, reviews are increasingly multimodal, with photos in addition to textual content. In this thesis, we focus on modeling the subjectivity carried in this form of data, with two research objectives.
In the first part, we tackle the problem of detecting sentiment expressed by a review. This is a key unlocking many applications, e.g., analyzing opinions, monitoring consumer satisfaction, assessing product quality.
Traditionally, the task of sentiment analysis primarily relies on textual content. We focus on the visual sentiment of review images and develop models to systematically analyze the impact of three factors: image, user, and item. Further investigation leads to a notion of concept-orientation generalizing visual sentiment analysis for Web images. Then, we observe that in many cases, with respect to sentiment detection, images play a supporting role to text, highlighting the salient aspects of an entity, rather than expressing sentiments independently. Therefore, we develop a visual aspect attention mechanism that relies on visual information as alignment for pointing out the important sentences of a document.
The method is effective for a scenario of one document being associated with multiple images, such as online reviews, blog posts, social networks, and media articles. Furthermore, we study the utilization of sentiment as an independent modality in the context of cross-modal retrieval. We first formulate the problem of sentiment-oriented text-to-image retrieval and then propose two approaches for incorporating sentiment into text queries based on metric learning. Each approach emphasizes a hypothesis on how the sentiment vectors aligned in the metric space that also includes text and visual vectors.
In the second part, we focus on developing models for capturing user preferences from multimodal data. Preference modeling is crucial to recommender systems which are core to modern online user-based platforms. The need for recommendations is to guide users in browsing the myriad of options offered to them. In online reviews, for instance, preference manifests in numerical rating, textual content, as well as visual images. First, we hypothesize that modeling these modalities jointly would result in a more holistic representation of a review towards more accurate recommendations. Therefore, we propose an approach that captures user preferences via simultaneously modeling a rating prediction component and a review text generation component. Second, we introduce a new generative model of preferences, inspired by the dyadic nature of the preference signals. The model is bilateral making it more apt for bipartite interactions, as well as allowing easy incorporation of auxiliary data from both sides of user and item. Third, we develop a probabilistic framework for modeling preferences involving logged bandit feedback. It helps deal with the sparsity issue in learning from bandit feedback on publisher sites by leveraging relevant organic feedback from e-commerce sites. Through empirical evaluation, we demonstrate that the proposed framework is effective for recommendation and ads placement systems.
In general, we present multiple approaches to modeling various aspects of sentiment and preference signals from multimodal data. Our work contributes a set of techniques that could be broadly extensible for mining Web data. Additionally, this research facilitates the development of recommender systems, which play a significant role in many online user-based platforms.