Shiv Shankar

Graduate Student

My research is centered around developing tools for reasoning and estimation for real-world applications. To this end I mostly work on machine learning and probabilistic models. I am also interested in grounding this research in real-world applications in sustainability and health-care.

personal info

name: Shiv Shankar
Who am i?

I AM not mind, intelligence, ego ; not in ear, tongue, nose, eye ; not sky, earth, wind, fire.
form of ETERNAL BLISS

What do I do?
I currently play the role of a PhD student at the University of Massachusetts. I am a member of the InfoFusion Lab and also of the Machine Learning and Data Science Lab (MLDS). My research focuses on causality, variational inference and statistical estimation, with a focus on challenges of real-world applications. It also often overlaps with reinforcement learning and deep learning. My days are mostly packed with tasks and deadlines (which I inevitably miss) but I try to take out some time for fun and mischief.
What do I like?
I am curious about all kinds of scientific disciplines. A goal of mine in life is to develop some level of understanding in all other scientific fields. I am also an advocate of environmental and social impact. I can often be found discussing different research questions with students and professors alike. Occasionally, I also take out time for enjoying random movies and useless debates about philosophy, science, religion, cultures and anime.
What I'm actually good at?
I think only others can give a fair answer to this question.
Recent Updates
  • Our paper on estimating human preferences under identity fragmentation was published at ICML'24
  • Our paper on estimating treatment effects under unknown interference was published at AISTATS'24
  • Our paper on efficent design of instrumental variables for indirect experimentation was published at ICLR'24
  • Our paper on inferring structured outputs via feed-forward neural networks was published at UAI'23
  • Our work on information loss in multimodal fusion and how it can be alleviated was presented at SIAM CSE'23
  • Our paper on conducting A/B testing in a identifier-less internet was published at AISTATS'23
  • Our paper on privacy aware online experimentation was published at WSDM'23
  • Our paper on off-policy evaluation under action-dependent non-stationarity was published at NeurIPS'22
  • Our paper on using variational circuits for training boson sampling models was published at QCE'22
  • Our work on variational boson sampling models was presented at QIP'22
  • Our paper on neuro-science inspired training losses to improve multimodal fusion models was published at ACL'22

Research

UNITE: A/B testing under Interference with Partial Network Information

UNITE: A/B testing under Interference with Partial Network Information

Accepted at AISTATS, 2023
project

A/B tests are often required to be conducted on subjects that might have social connections. For example, experiments on social media or medical interventions to control the spread of an epidemic. In such settings, the SUTVA assumption for randomized-controlled trials is violated due to network interference, or spill-over effects, as treatments to group A can potentially also affect the control group B. When the underlying social network is known exactly, prior works have demonstrated how to conduct A/B tests adequately to estimate the global average treatment effect (GATE). However, in practice, it is often impossible to obtain knowledge about the exact underlying network. In this paper, we present estimator(s) that relax this assumption and can identify GATE while only relying on knowledge of the superset of neighbors for any subject in the graph. Through theoretical analysis and extensive experiments, we show that the proposed approach performs better when compared to standard estimators.

Adaptive Instrument Design for Indirect Experiments

Adaptive Instrument Design for Indirect Experiments

Accepted at ICLR, 2024
project

Abstract: In human-AI systems, AI can only be suggestive and not prescriptive about what a human should do (e.g., how should a student interact with LLMs to learn quicker). In such cases, how should AI systems interact strategically to quickly estimate what would have happened had the human complied to its suggestions?

DIET for a Cookieless World

Direct Inference of Effect of Treatment (DIET) for a Cookieless World

Accepted at AISTATS, 2023
project

Brands use cookies and device identifiers to link different web visits to the same consumer. However, with increasing demands for privacy, these identifiers are about to be phased out, making identity fragmentation a permanent feature of the online world. Assessing treatment effects via randomized experiments (A/B testing) in such a scenario is challenging because identity fragmentation causes a) users to receive hybrid/mixed treatments, and b) hides the causal link between the historical treatments and the outcome. In this work, we address the problem of estimating treatment effects when a lack of identification leads to incomplete knowledge of historical treatments. This is a challenging problem which has not been addressed in literature yet. We develop a new method called DIET, which can adjust for users being exposed to mixed treatments without the entire history of treatments being available. Our method takes inspiration from the Cox model, and uses a proportional outcome approach under which we prove that one can obtain consistent estimates of treatment effects even under identity fragmentation. Our experiments, on one simulated and two real datasets, show that our method leads to up to 20% reduction in error and 25% reduction in bias over the naive estimate.

Privacy Aware Experiments without Cookies

Privacy Aware Experiments without Cookies

Accepted at WSDM, 2023 Patent filed with Adobe, US
project

Consider two brands that want to jointly test alternate web experiences for their customers with an A/B test. Such collaborative tests are today enabled usingthird-party cookies, where each brand has information on the identity of visitors to another website, ensuring a consistent treatment experience. With the imminent elimination of third-party cookies, such A/B tests will become untenable. We propose a two-stage experimental design, where the two brands only need to agree on high-level aggregate parameters of the experiment to test the alternate experiences. Our design respects the privacy of customers. We propose an unbiased estimator of the Average Treatment Effect (ATE), and provide a way to use regression adjustment to improve this estimate. On real and simulated data, we show that the approach provides valid estimate of the ATE and is robust to the proportion of visitors overlapping across the brands. Our demonstration describes how a marketer can design such an experiment and analyze the results.

Off-Policy Evaluation for Action-Dependent Non-stationary Environments

Off-Policy Evaluation for Action-Dependent Non-stationary Environments

Accepted at Neurips, 2022
project

Methods for sequential decision-making are often built upon a foundational assumption that the underlying decision process is stationary. This limits the application of such methods because real-world problems are often subject to changes due to external factors (passive non-stationarity), changes induced by interactions with the system itself (active non-stationarity), or both (hybrid non-stationarity). In this work, we take the first steps towards the fundamental challenge of on-policy and off-policy evaluation amidst structured changes due to active, passive, or hybrid non-stationarity. Towards this goal, we make a higher-order stationarity assumption such that non-stationarity results in changes over time, but the way changes happen is fixed. We propose, OPEN, an algorithm that uses a double application of counterfactual reasoning and a novel importance-weighted instrument-variable regression to obtain both a lower bias and a lower variance estimate of the structure in the changes of a policy's past performances. Finally, we show promising results on how OPEN can be used to predict future performances for several domains inspired by real-world applications that exhibit non-stationarity.

Multimodal fusion via cortical network inspired losses

Multimodal fusion via cortical network inspired losses

Accepted at ACL 2022
project

Information integration from different modalities is an active area of research. Human beings and, in general, biological neural systems are quite adept at using a multitude of signals from different sensory perceptive fields to interact with the environment and each other. Recent work in deep fusion models via neural networks has led to substantial improvements over unimodal approaches in areas like speech recognition, emotion recognition and analysis, captioning and image description. However, such research has mostly focused on architectural changes allowing for fusion of different modalities while keeping the model complexity manageable. Inspired by neuroscientific ideas about multisensory integration and processing, we investigate the effect of introducing neural dependencies in the loss functions. Experiments on multimodal sentiment analysis tasks with different models show that our approach provides a consistent performance boost.

High Confidence Off-Policy Variance Estimation

High Confidence Off-Policy Variance Estimation

Accepted at AAAI, 2021
project

Many sequential decision-making systems leverage data collected using prior policies to propose a new policy. In critical applications, it is important that high-confidence guarantees on the new policy's behavior are provided before deployment, to ensure that the policy will behave as desired. Prior works have studied high-confidence off-policy estimation of the \emph{expected} return, however, high-confidence off-policy estimation of the \emph{variance} of returns can be equally critical for high-risk applications. In this paper, we tackle the previously open problem of estimating and bounding, with high confidence, the variance of returns from off-policy data.

Variational Boson Sampling

Variational Boson Sampling

Accepted at QCE 2022 and ECML 2022
project

Boson Samplers are near-term quantum devices based on photonic quantum technology, which can outperform classical computing systems. This paper takes a hybrid circuit learning approach to utilize boson samplers as a generative model called Variational Boson Sampling (VBS). VBS introduces an optimizable parametric structure into the evolution operator for boson sampling and uses the complete model as a variational ansatz. To simulate working with real quantum devices, we use gradient free-optimization methods to optimize the resultant circuit. We experiment with this framework for problems in optimization and generative modeling.

Bosonic Random Walk Neural Networks for Graph Learning

Bosonic Random Walk Neural Networks for Graph Learning

Accepted at Complex Network, 2021 and NeurIPS 20
project

CMost research in deep learning has predominantly focused on the development of new models and training procedures. In contrast, the exploration of training objectives has received considerably less attention, often limited to combinations of standard losses. When dealing with complex structured outputs, the effectiveness of conventional objectives as proxies for the true objective becomes can be questionable. In this study, we propose that existing inference network-based methods for structured prediction, indirectly learn to optimize a dynamic loss objective parameterized by the energy model. Based on this insight, we propose a method that treats the energy network as a trainable loss function and employs an implicit-gradient-based technique to learn the corresponding dynamic objective. We experiment with multiple tasks such as multi-label classification, entity recognition, etc. and find significant performance improvements over baseline approaches. Our results demonstrate that implicitly learning a dynamic loss landscape proves to be an effective approach for enhancing model performance in structured prediction tasks.

Implicit Training of Inference Network Models for Structured Prediction

Implicit Training of Inference Network Models for Structured Prediction

Accepted at UAI, 2023
project

Most research in deep learning has predominantly focused on the development of new models and training procedures. In contrast, the exploration of training objectives has received considerably less attention, often limited to combinations of standard losses. When dealing with complex structured outputs, the effectiveness of conventional objectives as proxies for the true objective becomes can be questionable. In this study, we propose that existing inference network-based methods for structured prediction, indirectly learn to optimize a dynamic loss objective parameterized by the energy model. Based on this insight, we propose a method that treats the energy network as a trainable loss function and employs an implicit-gradient-based technique to learn the corresponding dynamic objective. We experiment with multiple tasks such as multi-label classification, entity recognition, etc. and find significant performance improvements over baseline approaches. Our results demonstrate that implicitly learning a dynamic loss landscape proves to be an effective approach for enhancing model performance in structured prediction tasks.

Optimizing for the Future in Non-Stationary MDPs

Optimizing for the Future in Non-Stationary MDPs

Accepted at ICML, 2020
project

Most reinforcement learning methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process is stationary. However, in many real-world applications, this assumption is violated, and using existing algorithms may result in a performance lag. To proactively search for a good future policy, we present a policy gradient algorithm that maximizes a forecast of future performance. This forecast is obtained by fitting a curve to the counter-factual estimates of policy performance over time, without explicitly modeling the underlying non-stationarity. The resulting algorithm amounts to a non-uniform reweighting of past data, and we observe that minimizing performance over some of the data from past episodes can be beneficial when searching for a policy that maximizes future performance. We show that our algorithm, called Prognosticator, is more robust to non-stationarity than two online adaptation techniques, on three simulated problems motivated by real-world applications.

Posterior attention models for sequence to sequence learning

Posterior attention models for sequence to sequence learning

Accepted at ICLR, 2019
project

Modern neural architectures critically rely on attention for mapping structured inputs to sequences. In this paper we show that prevalent attention architectures do not adequately model the dependence among the attention and output tokens across a predicted sequence. We present an alternative architecture called Posterior Attention Models that after a principled factorization of the full joint distribution of the attention and output variables, proposes two major changes. First, the position where attention is marginalized is changed from the input to the output. Second, the attention propagated to the next decoding stage is a posterior attention distribution conditioned on the output. Empirically on five translation and two morphological inflection tasks the proposed posterior attention models yield better BLEU score and alignment accuracy than existing attention models.

Surprisingly easy hard-attention for sequence to sequence learning

Surprisingly easy hard-attention for sequence to sequence learning

Accepted at EMNLP, 2018
project

In this paper we show that a simple beam approximation of the joint distribution between attention and output is an easy, accurate, and efficient attention mechanism for sequence to sequence learning. The method combines the advantage of sharp focus in hard attention and the implementation ease of soft attention. On five translation tasks we show effortless and consistent gains in BLEU compared to existing attention mechanisms..

Generalizing Across Domains via Cross-Gradient Training

Generalizing Across Domains via Cross-Gradient Training

Accepted at ICLR, 2019
project

We present CROSSGRAD, a method to use multi-domain training data to learn a classifier that generalizes to new domains. CROSSGRAD does not need an adaptation phase via labeled or unlabeled data, or domain features in the new domain. Most existing domain adaptation methods attempt to erase domain signals using techniques like domain adversarial training. In contrast, CROSSGRAD is free to use domain signals for predicting labels, if it can prevent overfitting on training domains. We conceptualize the task in a Bayesian setting, in which a sampling step is implemented as data augmentation, based on domain-guided perturbations of input instances. CROSSGRAD parallelly trains a label and a domain classifier on examples perturbed by loss gradients of each other’s objectives. This enables us to directly perturb inputs, without separating and re-mixing domain signals while making various distributional assumptions. Empirical evaluation on three different applications where this setting is natural establishes that (1) domain-guided perturbation provides consistently better generalization to unseen domains, compared to generic instance perturbation methods, and that (2) data augmentation is a more stable and accurate method than domain adversarial training.

f4: Facebook's warm {BLOB} storage system

f4: Facebook's warm {BLOB} storage system

Accepted at USENIX, 2014
project

Facebook’s corpus of photos, videos, and other Binary Large OBjects (BLOBs) that need to be reliably stored and quickly accessible is massive and continues to grow. As the footprint of BLOBs increases, storing them in our traditional storage system, Haystack, is becoming increasingly inefficient. To increase our storage efficiency, measured in the effective-replication-factor of BLOBs, we examine the underlying access patterns of BLOBs and identify temperature zones that include hot BLOBs that are accessed frequently and warm BLOBs that are accessed far less often. Our overall BLOB storage system is designed to isolate warm BLOBs and enable us to use a specialized warm BLOB storage system, f4. f4 is a new system that lowers the effective-replication-factor of warm BLOBs while remaining fault tolerant and able to support the lower throughput demands.

Send a message

ssh[complete my lastname]@umass.edu

Visit me

College of Information and Computer Science, Governors Drive, University of Massachusetts

Languages

English : Multilingual/Native Proficiency Hindi : Multilingual/Native Proficiency French : Functional Proficiency
--