Ashutosh Saxena

Ashutosh Saxena

Ashutosh Saxena is an Indian-American computer scientist, researcher, and entrepreneur known for his contributions to the field of artificial intelligence and large-scale robot learning. His interests include building enterprise AI agents and embodied AI. Saxena is the co-founder and CEO of Caspar.AI, where generative AI parses data from ambient 3D radar sensors to predict 20+ health & wellness markers for pro-active patient care. Prior to Caspar.AI, Ashutosh co-founded Cognical Katapult (NSDQ: KPLT), which provides a no credit required alternative to traditional financing for online and omni-channel retail. Before Katapult, Saxena was an assistant professor in the Computer Science Department and faculty director of the RoboBrain Project (a large-scale AI model for robotics) at Cornell University. == Education == In 2009, with artificial intelligence pioneer Andrew Ng as his advisor, Saxena received both his M.S. and Ph.D. in computer science with an emphasis on artificial intelligence from Stanford University. Saxena received his bachelor's degree in electrical engineering from the Indian Institute of Technology, Kanpur in 2004. == Career == Saxena was the chief scientist of New York-based Holopad, where he worked with Steven Spielberg's team to create walkthroughs and 3D experiences for his movie TinTin. His past experiences include building acoustic AI models at Bose Corporation. Once Ashutosh completed his undergraduate degree, he became a researcher at the Commonwealth Scientific and Industrial Research Organization, where he developed AI models for medical devices. Before Caspar, Saxena pursued other entrepreneurial ventures, such as ZunaVision, an artificial intelligence startup he co-founded with Andrew Ng that uses AI to embed advertising space within videos. Ashutosh served as the CTO of ZunaVision from 2008 to 2010. After ZunaVision, Saxena co-founded Cognical Katapult, which provided financing solutions to nonprime and underbanked consumers powered by artificial intelligence. From 2014 to 2016, Saxena served as the faculty director of the RoboBrain project, which was a joint venture that he started between Stanford University, Cornell University, Brown University, and the University of California, Berkeley that made a knowledge engine for robots. Saxena co-founded Brain of Things in 2015 with David Cheriton, who serves as chief scientist, and was listed as the fastest growing private company reaching an annual recurring revenue of $8 million in three years. It has been widely covered in several outlets including Forbes Japan, and MIT Technology Review. Saxena's work on deep learning won test of time award in 2023 by Robotics Science and Systems. Ashutosh has been recognized for his work by receiving the Alfred P. Sloan Fellow in 2011, Google Faculty Research Award in 2012, Microsoft Faculty Fellowship in 2012, NSF Career award in 2013, One of the Eight Innovators to Watch by the Smithsonian Institution in 2015, and received TR35 Innovator Award by MIT Technology Review in 2018. He was named by San Francisco Business Times as a 40 under 40 young business leader. == Research == Saxena has authored over 100 published papers in the areas of large-scale robot learning and artificial intelligence, with 20,000+ citations. His work in the fields of computer vision and deep learning have been featured in press releases and academic journal reviews. Ashutosh's early work includes the Stanford Artificial Intelligence Robot (STAIR), an AI models that enables to perform tasks such as unload items from a dishwasher, which was covered on the front-page of New York Times. His work on Make3D, was the first work that estimated 3D depth from a single still image. At Cornell University, Ashutosh led the Robot Learning Lab, which used a machine learning approach to train robots to perform tasks in human environments such as generalizing manipulation in 3D point-clouds where robots learn to transfer manipulation trajectories to novel objects utilizing a large sample of demonstrations from crowdsourcing.

Object co-segmentation

In computer vision, object co-segmentation is a special case of image segmentation, which is defined as jointly segmenting semantically similar objects in multiple images or video frames. == Challenges == It is often challenging to extract segmentation masks of a target/object from a noisy collection of images or video frames, which involves object discovery coupled with segmentation. A noisy collection implies that the object/target is present sporadically in a set of images or the object/target disappears intermittently throughout the video of interest. Early methods typically involve mid-level representations such as object proposals. == Dynamic Markov networks-based methods == A joint object discover and co-segmentation method based on coupled dynamic Markov networks has been proposed recently, which claims significant improvements in robustness against irrelevant/noisy video frames. Unlike previous efforts which conveniently assumes the consistent presence of the target objects throughout the input video, this coupled dual dynamic Markov network based algorithm simultaneously carries out both the detection and segmentation tasks with two respective Markov networks jointly updated via belief propagation. Specifically, the Markov network responsible for segmentation is initialized with superpixels and provides information for its Markov counterpart responsible for the object detection task. Conversely, the Markov network responsible for detection builds the object proposal graph with inputs including the spatio-temporal segmentation tubes. == Graph cut-based methods == Graph cut optimization is a popular tool in computer vision, especially in earlier image segmentation applications. As an extension of regular graph cuts, multi-level hypergraph cut is proposed to account for more complex high order correspondences among video groups beyond typical pairwise correlations. With such hypergraph extension, multiple modalities of correspondences, including low-level appearance, saliency, coherent motion and high level features such as object regions, could be seamlessly incorporated in the hyperedge computation. In addition, as a core advantage over co-occurrence based approach, hypergraph implicitly retains more complex correspondences among its vertices, with the hyperedge weights conveniently computed by eigenvalue decomposition of Laplacian matrices. == CNN/LSTM-based methods == In action localization applications, object co-segmentation is also implemented as the segment-tube spatio-temporal detector. Inspired by the recent spatio-temporal action localization efforts with tubelets (sequences of bounding boxes), Le et al. present a new spatio-temporal action localization detector Segment-tube, which consists of sequences of per-frame segmentation masks. This Segment-tube detector can temporally pinpoint the starting/ending frame of each action category in the presence of preceding/subsequent interference actions in untrimmed videos. Simultaneously, the Segment-tube detector produces per-frame segmentation masks instead of bounding boxes, offering superior spatial accuracy to tubelets. This is achieved by alternating iterative optimization between temporal action localization and spatial action segmentation. The proposed segment-tube detector is illustrated in the flowchart on the right. The sample input is an untrimmed video containing all frames in a pair figure skating video, with only a portion of these frames belonging to a relevant category (e.g., the DeathSpirals). Initialized with saliency based image segmentation on individual frames, this method first performs temporal action localization step with a cascaded 3D CNN and LSTM, and pinpoints the starting frame and the ending frame of a target action with a coarse-to-fine strategy. Subsequently, the segment-tube detector refines per-frame spatial segmentation with graph cut by focusing on relevant frames identified by the temporal action localization step. The optimization alternates between the temporal action localization and spatial action segmentation in an iterative manner. Upon practical convergence, the final spatio-temporal action localization results are obtained in the format of a sequence of per-frame segmentation masks (bottom row in the flowchart) with precise starting/ending frames.

GPU switching

GPU switching is a mechanism used on computers with multiple graphic controllers. This mechanism allows the user to either maximize the graphic performance or prolong battery life by switching between the graphic cards. It is mostly used on gaming laptops which usually have an integrated graphic device and a discrete video card. == Basic components == Most computers using this feature contain integrated graphics processors and dedicated graphics cards that applies to the following categories. === Integrated graphics === Also known as: Integrated graphics, shared graphics solutions, integrated graphics processors (IGP) or unified memory architecture (UMA). This kind of graphics processors usually have much fewer processing units and share the same memory with the CPU. Sometimes the graphics processors are integrated onto a motherboard. It is commonly known as: on-board graphics. A motherboard with on-board graphics processors doesn't require a discrete graphics card or a CPU with graphics processors to operate. === Dedicated graphics cards === Also known as: discrete graphics cards. Unlike integrated graphics, dedicated graphics cards have much more processing units and have its own RAM with much higher memory bandwidth. In some cases, a dedicated graphics chip can be integrated onto the motherboards, B150-GP104 for example. Regardless of the fact that the graphics chip is integrated, it is still counted as a dedicated graphics cards system because the graphics chip is integrated with its own memory. == Theory == Most Personal Computers have a motherboard that uses a Southbridge and Northbridge structure. === Northbridge control === The Northbridge is one of the core logic chipset that handles communications between the CPU, GPU, RAM and the Southbridge. The discrete graphics card is usually installed onto the graphics card slot such as PCI-Express and the integrated graphics is integrated onto the CPU itself or occasionally onto the Northbridge. The Northbridge is the most responsible for switching between GPUs. The way how it works usually has the following process (refer to the Figure 1. on the right): The Northbridge receives input from Southbridge through the internal bus. The Northbridge signals to CPU through the Front-side bus. The CPU runs the task assignment application (usually the graphics card driver) to determine which GPU core to use. The CPU passes down the command to the Northbridge. The Northbridge passes down the command to the according GPU core. The GPU core processes the command and returns the rendered data back to the Northbridge. The Northbridge sends the rendered data back to Southbridge. === Southbridge control === The Southbridge is a set of integrated circuits such Intel's I/O Controller Hub (ICH). It handles all of a computer's I/O functions, such as receiving the keyboard input and outputting the data onto the screen. The way how it usually works usually has two steps: Take in the user input and pass it down to the Northbridge. (Optional) Receive the rendered data from the Northbridge and output it. The reason why the second step can be optional is that sometimes the rendered the data is outputted directly from the discrete graphics card which is located on the graphics card slot so there is no need to output the data through the Southbridge. == Main purpose == GPU switching is mostly used for saving energy by switching between graphic cards. The dedicated graphics cards consume much more power than integrated graphics but also provides higher 3D performances, which is needed for a better gaming and CAD experience. Following is a list of the TDPs of the most popular CPU with integrated graphics and dedicated graphics cards. The dedicated graphics cards exhibit much higher power consumption than the integrated graphics on both platforms. Disabling them when no heavy graphics processing is needed can significantly lower the power consumption. == Technologies == === Nvidia Optimus === Nvidia Optimus™ is a computer GPU switching technology created by Nvidia that can dynamically and seamlessly switch between two graphic cards based on running programs. === AMD Enduro === AMD Enduro™ is a collective brand developed by AMD that features many new technologies that can significantly save power. It was previously named as: PowerXpress and Dynamic Switchable Graphics (DSG). This technology implements a sophisticated system to predict the potential usage need for graphics cards and switch between graphics cards based on predicted need. This technology also introduces a new power control plan that allows the discrete graphics cards consume no energy when idling. == Manufacturers == === Integrated graphics === In personal computers, the IGP (integrated graphics processors) are mostly manufactured by Intel and AMD and are integrated onto their CPUs. They are commonly known as: Intel HD and Iris Graphics - also called HD series and Iris series AMD Accelerated Processing Unit (APU) - also formerly known as: fusion === Dedicated graphics cards === The most popular dedicated graphics cards are manufactured by AMD and Nvidia. They are commonly known as: AMD Radeon Nvidia GeForce == Drivers and OS support == Most common operating systems have built-in support for this feature. However, the users may download the updated drivers from Nvidia or AMD for better experience. === Windows support === Windows 7 has built-in support for this feature. The system automatically switches between GPUs depending on the program that's running. However, the user may switch the GPUs manually through device manager or power manager. === Linux === Modern Linux systems handle hybrid graphics in two parts: power/control for the inactive GPU, and optional render offloading for individual applications. vga_switcheroo (in the kernel since 2.6.34) coordinates power and mux control on systems with multiple GPUs. It was designed primarily for muxed designs (hardware display switch), and on muxless laptops it is typically used only for power control. A display server restart is no longer required for offloading on muxless systems. DRI PRIME (Mesa) enables per-process render offload on muxless systems: an app renders on the discrete GPU and the integrated GPU presents the result. Users can opt in via the DRI_PRIME environment variable (e.g., DRI_PRIME=1) or desktop integration. On GNOME, the switcheroo-control service exposes the discrete GPU to the shell, adding a “Launch using Discrete Graphics Card” entry to app menus on supported systems (Wayland or Xorg), which invokes render offload under the hood. With the proprietary Nvidia driver, render offload is provided as PRIME Render Offload (supported since driver 435.xx). Distributions commonly ship a helper like prime-run or desktop menu entries that set the required environment for offloading. ==== Notes and limitations (Linux) ==== On muxless systems the internal display is hard-wired to the integrated GPU; the discrete GPU cannot directly drive that panel and instead renders offscreen for composition by the iGPU. External displays connected to the dGPU may allow direct output depending on the laptop’s wiring. Power-saving behavior varies by driver and distro defaults. Some setups need explicit configuration to power down the inactive GPU when idle. Desktop integrations (e.g., GNOME's menu item) simply opt an app into offload; they do not "auto-switch" the whole session. Users can still launch apps on either GPU as needed.

Polyfill (programming)

In software development, a polyfill is code that implements a new standard feature of a deployment environment within an old version of that environment that does not natively support the feature. Most often, it refers to JavaScript code that implements an HTML5 or CSS web standard, either an established standard (supported by some browsers) on older browsers, or a proposed standard (not supported by any browsers) on existing browsers. Polyfills are also used in PHP and Python. Polyfills allow web developers to use an API regardless of whether or not it is supported by a browser, and usually with minimal overhead. Typically they first check if a browser supports an API, and use it if available, otherwise using their own implementation. Polyfills themselves use other, more supported features, and thus different polyfills may be needed for different browsers. The term is also used as a verb: polyfilling is providing a polyfill for a feature. == Definition == The term is a neologism, coined by Remy Sharp, who required a word that meant "replicate an API using JavaScript (or Flash or whatever) if the browser doesn’t have it natively" while co-writing the book Introducing HTML5 in 2009. Formally, "a shim is a library that brings a new API to an older environment, using only the means of that environment." Polyfills exactly fit this definition; the term shim was also used for early polyfills. However, to Sharp shim connoted non-transparent APIs and workarounds, such as spacer GIFs for layout, sometimes known as shim.gif, and similar terms such as progressive enhancement and graceful degradation were not appropriate, so he invented a new term. The term is based on the multipurpose filling paste brand Polyfilla, a paste used to cover up cracks and holes in walls, and the meaning "fill in holes (in functionality) in many (poly-) ways." The word has since gained popularity, particularly due to its use by Paul Irish and in Modernizr documentation. The distinction that Sharp makes is: What makes a polyfill different from the techniques we have already, like a shim, is this: if you removed the polyfill script, your code would continue to work, without any changes required in spite of the polyfill being removed. This distinction is not drawn by other authors. At times various other distinctions are drawn between shims, polyfills, and fallbacks, but there are no generally accepted distinctions: most consider polyfills a form of shim. The term polyfiller is also occasionally found. == Examples == === core-js === core-js is one of the most popular JavaScript standard library polyfills. Includes polyfills for ECMAScript up to the latest version of the standard: promises, symbols, collections, iterators, typed arrays, many other features, ECMAScript proposals, some cross-platform WHATWG / W3C features and proposals like URL. You can load only required features or use it without global namespace pollution. It can be integrated with Babel, which allows it to automatically inject required core-js modules into your code. === html5shiv === In IE versions prior to 9, unknown HTML elements like

and