David Horn (Israeli physicist)

David Horn (Israeli physicist)

David Horn (Hebrew: דוד הורן; born 10 September 1937) is a Professor (Emeritus) of Physics in the School of Physics and Astronomy at Tel Aviv University (TAU), Israel. He has served as Vice-Rector of TAU, Chairman of the School of Physics and Astronomy and as Dean of the Faculty of Exact Sciences in TAU. He is a fellow of the American Physical Society, nominated for "contributions to theoretical particle physics, including the seminal work on finite energy sum rules, research of the phenomenology of hadronic processes, and investigation of Hamiltonian lattice theories". == Early life and education == David Horn was born and educated in Haifa. He graduated from the Reali School in 1955. He began his academic studies in Physics at the Technion in Haifa in 1957, and received his B.Sc. (Summa Cum Laude) in 1961, and M.Sc. in 1962. He continued his Ph.D. studies at the Hebrew University of Jerusalem until 1965. His thesis on "Some Aspects of the Structure of Weak Interactions" was supervised by Prof. Yuval Ne'eman. == Career == Horn joined the newly founded Tel Aviv University as an assistant in 1962. He became a lecturer in 1965, a senior lecturer in 1967 and an associate professor in 1968. He was promoted to full professor of Physics in 1972. In 1974 he became the incumbent of the Edouard and Francoise Jaupart Chair of Theoretical Physics of Particles and Fields, a position he held until 2007. Horn has supervised 43 graduate students at TAU and authored over 240 scientific publications. He retired as a professor emeritus in 2005, and continues to be an active researcher. Horn spent a significant part of his career holding visiting academic positions at other universities and research institutes, including: Postdoctoral Fellow at Argonne National Lab, ILL, Research Fellow and three times Visiting Associate at California Institute of Technology, Pasadena, CA, Visitor at CERN in Geneva, Visiting Professor at Cornell University, NY, Member of the Institute for Advanced Study, Princeton, NJ, Visiting Professor at SLAC in Stanford University, CA, and Visiting Professor at Kyoto University, Japan. Beginning from 1980, Horn held official positions at Tel Aviv University, starting with tenure as Vice-Rector (1980-1983), a position he left for research at SLAC. After returning he was nominated Chairman of the Department of High Energy Physics (1984-1986), followed by tenures as Chairman of the School of Physics and Astronomy (1986-9), Dean of the Raymond and Beverly Sackler Faculty of Exact Sciences (1990-1995), and first Director of the Adams Super Center for Brain Studies (1993-2000). Horn has also held national and international professional positions. He was Chairman of the Israel Commission for High Energy Physics (1983-2003), and, in this capacity, served as an Israeli observer of the council of CERN (1991-2003). He served as member of the Israel Council for Higher Education (1987-1991), member of the Executive Committee of the European Physical Society (1989-1992) and member of the European Strategy Forum on Research Infrastructures (2005-2017). He chaired the Israeli Committee of Research Infrastructures (2012-2016), issuing roadmaps for scientific RI in 2013 and 2016. == Research == Horn's research work focused on theory and phenomenology of High Energy Physics until 1990. He then shifted his interests to Neural Computation and Machine Learning and, since 2005, he has also published in Bioinformatics. Together with Richard Dolen and Christoph Schmid he discovered the Finite Energy Sum Rules in 1967. It was a realization of the bootstrap approach to hadronic structure, and became known as the Dolen-Horn-Schmid Duality. Together with Richard Silver he investigated a model of coherent production of pions at high energy hadron collisions in 1971, and together with Jeffrey Mandula he undertook the investigation of mesons with constituent gluons in 1978. Moving to lattice gauge theories in 1979, he discovered, together with Shimon Yankielowic and Marvin Weinstein, a non-confining phase in Z(N) theories for large N. In 1981 he demonstrated the existence of finite matrix models with link gauge fields, nowadays known as quantum link models. In 1984 Horn and Weinstein developed the t-expansion methodology. Horn's contributions to neural modeling include a novel mechanism for memory maintenance via neuronal regulation in 1998, developed with Nir Levy and Eytan Ruppin and unsupervised learning of natural languages in 2005, a joint work with Zach Solan, Eytan Ruppin and Shimon Edelman, introducing novel algorithms for motif and grammar extraction from text. Horn has contributed to algorithms of clustering, an important topic in Machine Learning, by developing Support Vector Clustering (SVC) in 2001, together with Asa Ben Hur, Hava Siegelmann and Vladimir Vapnik. This was followed shortly thereafter by a joint work with Assaf Gottlieb on Quantum Clustering (QC). His contributions to Bioinformatics include motif descriptions of function and structure of proteins, as well as motif studies of genomic structures. Together with Erez Persi he studied compositional order of proteomes, and repeat instability of genomes, as evolution markers of organisms and of cancer (a joint work with Persi and others). == Honors == Horn is a Fellow of the American Physical Society (1985) and a Fellow of the Israel Physical Society (2018). == Publications == === Selected articles === R. Dolen, D. Horn and C. Schmid; Prediction of Regge-parameters of rho poles from low-energy pi-N scattering data Phys. Rev. Lett. 19 (1967) 402–407. Finite-Energy Sum Rules and Their Application to pi-N Charge Exchange Phys. Rev. 166 (1968) 1768–1781. D. Horn and R. Silver: Coherent production of pions, Annals Phys. 66 (1971) 509-541 T. Banks, D. Horn and H. Neuberger: Bosonization of the SU(N) Thirring Models, Nucl. Phys. B108, 119 (1976). D. Horn and J. Mandula: Model of Mesons with Constituent Gluons, Phys. Rev. D17, 898 (1978). D. Horn, M. Weinstein and S. Yankielowicz: Hamiltonian Approach to Z(N) Lattice Gauge Theories, Phys. Rev. D19, 3715 (1979). D. Horn: Finite Matrix Models with Continuous Local Gauge Invariance, Phys. Lett. 100B, 149-151 (1981). T. Banks, Y. Dothan and D. Horn: Geometric Fermions, Phys. Lett. 117B, 413 (1982). D. Horn and M. Weinstein: The t expansion: A nonperturbative analytic tool for Hamiltonian systems. Phys. Rev. D 30, 1256-1270 (1984). Ury Naftaly, Nathan Intrator and David Horn: Optimal Ensemble Averaging of Neural Networks. Network, Computation in Neural Systems, 8, 283-296 (1997). David Horn, Nir Levy, Eytan Ruppin: Memory Maintenance via Neuronal Regulation, Neural Computation, 10, 1-18 (1998). Asa Ben-Hur, David Horn, Hava Siegelmann and Vladimir Vapnik: Support Vector Clustering. Journal of Machine Learning Research 2, 125-137 (2001). David Horn and Assaf Gottlieb: Algorithm for data clustering in pattern recognition problems based on quantum mechanics, Phys. Rev. Lett. 88 (2002) 18702 Zach Solan, David Horn, Eytan Ruppin and Shimon Edelman: Unsupervised learning of natural languages, Proc. Natl. Acad. Sc. 102 (2005) 11629–11634. Vered Kunik, Yasmine Meroz, Zach Solan, Ben Sandbank, Uri Weingart, Eytan Ruppin and David Horn: Functional representation of enzymes by specific peptides. PLOS Computational Biology 2007, 3(8):e167. Benny Chor, David Horn, Yaron Levy, Nick Goldman and Tim Massingham: Genomic DNA k-mer spectra: models and modalities. Genome Biology 2009, 10(10):R108 Erez Persi and David Horn. Systematic Analysis of Compositional Order of Proteins Reveals New Characteristics of Biological Functions and a Universal Correlate of Macroevolution. PLoS Comput Biol 9 (2013): e1003346. David Horn. Taxa counting using Specific Peptides of Aminoacyl tRNA Synthetases Encyclopedia of Metagenomics, Springer, 2013. Sagi Shporer, Benny Chor, Saharon Rosset, David Horn. Inversion symmetry of DNA k-mer counts: validity and deviations. BMC Genomics 2016, 17:696 Erez Persi, Davide Prandi, Yuri I. Wolf, Yair Pozniak, Christopher Barbieri, Paola Gasperini, Himisha Beltran, Bishoy M. Faltas, Mark A. Rubin, Tamar Geiger, Eugene V. Koonin, Francesca Demichelis, David Horn. Proteomic and Genomic Signatures of Repeat Instability in Cancer and Adjacent Normal Tissues. PNAS 116, 34, 2019 - 08790 === Book === David Horn and Fredrick Zachariasen: Hadron Physics at Very High Energies. Benjamin 1973. === Patents === Method and Apparatus for Quantum Clustering. USA Patent No. 7,653,646 B2. Method for discovering relationships in data by dynamic quantum clustering USA Patent No 8874412 and USA Patent No. 9,646,074. == Personal life == Horn was married to Nira Fuss since 1963 until her death in 2019. He is a father of three, Yuval, Tamar, and Oded, and grandfather of nine. He lives in Tel Aviv, Israel.

MobileNet

MobileNet is a family of convolutional neural network (CNN) architectures designed for image classification, object detection, and other computer vision tasks. They are designed for small size, low latency, and low power consumption, making them suitable for on-device inference and edge computing on resource-constrained devices like mobile phones and embedded systems. They were originally designed to be run efficiently on mobile devices with TensorFlow Lite. The need for efficient deep learning models on mobile devices led researchers at Google to develop MobileNet. As of June 2025, the family has five versions, each improving upon the previous one in terms of performance and efficiency. == Features == === V1 === MobileNetV1 was published in April 2017. Its main architectural innovation was incorporation of depthwise separable convolutions. It was first developed by Laurent Sifre during an internship at Google Brain in 2013 as an architectural variation on AlexNet to improve convergence speed and model size. The depthwise separable convolution decomposes a single standard convolution into two convolutions: a depthwise convolution that filters each input channel independently and a pointwise convolution ( 1 × 1 {\displaystyle 1\times 1} convolution) that combines the outputs of the depthwise convolution. This factorization significantly reduces computational cost. The MobileNetV1 has two hyperparameters: a width multiplier α {\displaystyle \alpha } that controls the number of channels in each layer. Smaller values of α {\displaystyle \alpha } lead to smaller and faster models, but at the cost of reduced accuracy, and a resolution multiplier ρ {\displaystyle \rho } , which controls the input resolution of the images. Lower resolutions result in faster processing but potentially lower accuracy. === V2 === MobileNetV2 was published in March 2019. It uses inverted residual layers and linear bottlenecks. Inverted residuals modify the traditional residual block structure. Instead of compressing the input channels before the depthwise convolution, they expand them. This expansion is followed by a 1 × 1 {\displaystyle 1\times 1} depthwise convolution and then a 1 × 1 {\displaystyle 1\times 1} projection layer that reduces the number of channels back down. This inverted structure helps to maintain representational capacity by allowing the depthwise convolution to operate on a higher-dimensional feature space, thus preserving more information flow during the convolutional process. Linear bottlenecks removes the typical ReLU activation function in the projection layers. This was rationalized by arguing that that nonlinear activation loses information in lower-dimensional spaces, which is problematic when the number of channels is already small. === V3 === MobileNetV3 was published in 2019. The publication included MobileNetV3-Small, MobileNetV3-Large, and MobileNetEdgeTPU (optimized for Pixel 4). They were found by a form of neural architecture search (NAS) that takes mobile latency into account, to achieve good trade-off between accuracy and latency. It used piecewise-linear approximations of swish and sigmoid activation functions (which they called "h-swish" and "h-sigmoid"), squeeze-and-excitation modules, and the inverted bottlenecks of MobileNetV2. === V4 === MobileNetV4 was published in September 2024. The publication included a large number of architectures found by NAS. Inspired by Vision Transformers, the V4 series included multi-query attention. It also unified both inverted residual and inverted bottleneck from the V3 series with the "universal inverted bottleneck", which includes these two as special cases. === V5 === MobileNetV5's architecture was published shortly after the release of Gemma 3n in June 2025. While the announcement stated a technical report on MobileNetV5 would be available soon, this has not yet materialised. The network is 10 times larger than the largest V4 variant.

JustWatch

JustWatch is a website that provides information on the availability of films and TV shows on various streaming platforms such as Netflix, HBO Max, Disney+, Hulu, Peacock, Fandango at Home, Apple TV, and Amazon Prime Video, among others. It is also available as a mobile application and smart TV application. JustWatch provides a search engine that allows users to discover which digital platforms host a particular movie or TV series. As of November 2023, JustWatch is available to users in 139 countries. == Features == JustWatch functions as a search engine by aggregating information about the online availability of films and TV series from video-on-demand streaming services. It aggregates information from more than 100 video content libraries, as well providing information about video resolution quality, pricing, and purchase or rental options. The website includes various filters for searching, including genre, price, release date, rating, and popularity. Users are also able to create lists of shows and movies and to share these lists with other users. == History == JustWatch GmbH is an international database company that is privately held and headquartered in Berlin, Germany. The company specializes in the online availability of movies and TV series. In addition to its user-facing website, the company also has an advertising-focused arm, JustWatch Media, that works with corporate clients, using data about what people watch that it gleans from user behavior to help entertainment companies tailor their marketing strategies. Its clients include Universal Pictures, Paramount Pictures, and Sony Pictures, among others. Development of the website began in 2014, and it was launched in the U.S. and Germany in February 2015. In 2018, the company received funding to improve databases within the European Union. In December 2019, the company acquired a rival streaming aggregation service, GoWatchIt, from Plexus Entertainment. JustWatch also used the acquisition to open its first New York office. In 2019, JustWatch had over 30 million users across 38 countries. By 2020, the company's streaming aggregation service was available in over 45 countries. By November 2023, it was available in 139 countries, and had over 40 million monthly users. === Founding === JustWatch was co-founded in 2013 by David Croyé, Cristoph Hoyer, Kevin Hiller, Dominik Raute, Ingke Weimert, and Michael Wilken. In a company blog post from February 2017, Croyé described the group of co-founders as all having previously "worked in leading roles at successful international tech-startups in Berlin." Croyé, who currently holds the title of CEO at JustWatch GmbH, had previously worked as the chief marketing officer at kaufDA, a European location-based mobile coupon and promotion service, and the background of other co-founders included time at the adtech company Trademob and the streaming site MyVideo. Startup capital for the website initially came from the founders themselves. Croyé in particular was able to reinvest funds he had obtained from the sale of kaufDA to Axel Springer, a European media company, in March 2011. Since 2015, the company has had at least one additional round of seed funding, with investors including venture capital groups CG Partners and STS Ventures.

ImageNet

The ImageNet project is a large visual database designed for use in visual object recognition software research. More than 14 million images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided. ImageNet contains more than 20,000 categories, with a typical category, such as "balloon" or "strawberry", consisting of several hundred images. The database of annotations of third-party image URLs is freely available directly from ImageNet, though the actual images are not owned by ImageNet. Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes. == History == AI researcher Fei-Fei Li began working on the idea for ImageNet in 2006. At a time when most AI research focused on models and algorithms, Li wanted to expand and improve the data available to train AI algorithms. In 2007, Li met with Princeton professor Christiane Fellbaum, one of the creators of WordNet, to discuss the project. As a result of this meeting, Li went on to build ImageNet starting from the roughly 22,000 nouns of WordNet and using many of its features. She was also inspired by a 1987 estimate that the average person recognizes roughly 30,000 different kinds of objects. As an assistant professor at Princeton, Li assembled a team of researchers to work on the ImageNet project. They used Amazon Mechanical Turk to help with the classification of images. Labeling started in July 2008 and ended in April 2010. It took 49K workers from 167 countries filtering and labeling over 160M candidate images. They had enough budget to have each of the 14 million images labelled three times. The original plan called for 10,000 images per category, for 40,000 categories at 400 million images, each verified 3 times. They found that humans can classify at most 2 images/sec. At this rate, it was estimated to take 19 human-years of labor (without rest). They presented their database for the first time as a poster at the 2009 Conference on Computer Vision and Pattern Recognition (CVPR) in Florida, titled "ImageNet: A Preview of a Large-scale Hierarchical Dataset". The poster was reused at Vision Sciences Society 2009. In 2009, Alex Berg suggested adding object localization as a task. Li approached PASCAL Visual Object Classes contest in 2009 for a collaboration. It resulted in the subsequent ImageNet Large Scale Visual Recognition Challenge starting in 2010, which has 1000 classes and object localization, as compared to PASCAL VOC which had just 20 classes and 19,737 images (in 2010). === Significance for deep learning === On 30 September 2012, a convolutional neural network (CNN) called AlexNet achieved a top-5 error of 15.3% in the ImageNet 2012 Challenge, more than 10.8 percentage points lower than that of the runner-up. Using convolutional neural networks was feasible due to the use of graphics processing units (GPUs) during training, an essential ingredient of the deep learning revolution. According to The Economist, "Suddenly people started to pay attention, not just within the AI community but across the technology industry as a whole." In 2015, AlexNet was outperformed by Microsoft's very deep CNN with over 100 layers, which won the ImageNet 2015 contest, having 3.57% error on the test set. Andrej Karpathy estimated in 2014 that with concentrated effort, he could reach 5.1% error rate, and ~10 people from his lab reached ~12-13% with less effort. It was estimated that with maximal effort, a human could reach 2.4%. == Dataset == ImageNet crowdsources its annotation process. Image-level annotations indicate the presence or absence of an object class in an image, such as "there are tigers in this image" or "there are no tigers in this image". Object-level annotations provide a bounding box around the (visible part of the) indicated object. ImageNet uses a variant of the broad WordNet schema to categorize objects, augmented with 120 categories of dog breeds to showcase fine-grained classification. In 2012, ImageNet was the world's largest academic user of Mechanical Turk. The average worker identified 50 images per minute. The original plan of the full ImageNet would have roughly 50M clean, diverse and full resolution images spread over approximately 50K synsets. This was not achieved. The summary statistics given on April 30, 2010: Total number of non-empty synsets: 21841 Total number of images: 14,197,122 Number of images with bounding box annotations: 1,034,908 Number of synsets with SIFT features: 1000 Number of images with SIFT features: 1.2 million === Categories === The categories of ImageNet were filtered from the WordNet concepts. Each concept, since it can contain multiple synonyms (for example, "kitty" and "young cat"), so each concept is called a "synonym set" or "synset". There were more than 100,000 synsets in WordNet 3.0, majority of them are nouns (80,000+). The ImageNet dataset filtered these to 21,841 synsets that are countable nouns that can be visually illustrated. Each synset in WordNet 3.0 has a "WordNet ID" (wnid), which is a concatenation of part of speech and an "offset" (a unique identifying number). Every wnid starts with "n" because ImageNet only includes nouns. For example, the wnid of synset "dog, domestic dog, Canis familiaris" is "n02084071". The categories in ImageNet fall into 9 levels, from level 1 (such as "mammal") to level 9 (such as "German shepherd"). === Image format === The images were scraped from online image search (Google, Picsearch, MSN, Yahoo, Flickr, etc) using synonyms in multiple languages. For example: German shepherd, German police dog, German shepherd dog, Alsatian, ovejero alemán, pastore tedesco, 德国牧羊犬. ImageNet consists of images in RGB format with varying resolutions. For example, in ImageNet 2012, "fish" category, the resolution ranges from 4288 x 2848 to 75 x 56. In machine learning, these are typically preprocessed into a standard constant resolution, and whitened, before further processing by neural networks. For example, in PyTorch, ImageNet images are by default normalized by dividing the pixel values so that they fall between 0 and 1, then subtracting by [0.485, 0.456, 0.406], then dividing by [0.229, 0.224, 0.225]. These are the mean and standard deviations for ImageNet, so this whitens the input data. === Labels and annotations === Each image is labelled with exactly one wnid. Dense SIFT features (raw SIFT descriptors, quantized codewords, and coordinates of each descriptor/codeword) for ImageNet-1K were available for download, designed for bag of visual words. The bounding boxes of objects were available for about 3000 popular synsets with on average 150 images in each synset. Furthermore, some images have attributes. They released 25 attributes for ~400 popular synsets: Color: black, blue, brown, gray, green, orange, pink, red, violet, white, yellow Pattern: spotted, striped Shape: long, round, rectangular, square Texture: furry, smooth, rough, shiny, metallic, vegetation, wooden, wet === ImageNet-21K === The full original dataset is referred to as ImageNet-21K. ImageNet-21k contains 14,197,122 images divided into 21,841 classes. Some papers round this up and name it ImageNet-22k. The full ImageNet-21k was released in Fall of 2011, as fall11_whole.tar. There is no official train-validation-test split for ImageNet-21k. Some classes contain only 1-10 samples, while others contain thousands. === ImageNet-1K === There are various subsets of the ImageNet dataset used in various context, sometimes referred to as "versions". One of the most highly used subsets of ImageNet is the "ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012–2017 image classification and localization dataset". This is also referred to in the research literature as ImageNet-1K or ILSVRC2017, reflecting the original ILSVRC challenge that involved 1,000 classes. ImageNet-1K contains 1,281,167 training images, 50,000 validation images and 100,000 test images. Each category in ImageNet-1K is a leaf category, meaning that there are no child nodes below it, unlike ImageNet-21K. For example, in ImageNet-21K, there are some images categorized as simply "mammal", whereas in ImageNet-1K, there are only images categorized as things like "German shepherd", since there are no child-words below "German shepherd". === Later developments === In the WordNet they built ImageNet on, there were 2832 synsets in the "person" subtree. During 2018--2020 period, they removed the download of the ImageNet-21k as they went through extensive filtering in these person synsets. Out of these 2832 synsets, 1593 were deemed "potentially offensive". Out of the remaining 1239, 1081 were deemed not really "visual". The result was that only 158 syn

Vero (app)

Vero (stylized as VERO) is a social media platform and mobile app company. Vero markets itself as a social network free from advertisements, data mining and algorithms. == History == The app was founded by French-Lebanese billionaire Ayman Hariri who is the son of former Lebanese prime minister Rafic Hariri. The name is taken from the Italian word for true. The app launched officially in 2015 as an alternative to Facebook and their popular photo-blogging app Instagram. Within weeks of its release the app surged in popularity although users expressed mixed reports with some feeling confused about how the app worked. Cosplayers were early to adopt the app as their photo-sharing platform of choice, favouring the app's pinch and zoom magnification feature over Instagram's zoom feature. Other creative communities soon followed, and the app became popular with niche groups of makeup artists, tattoo artists, and skateboarders. In March 2018, Vero's popularity surged, partly helped by an exodus from Facebook and Instagram following the Cambridge Analytica data scandal. In the wake of the scandal, Vero devised an advertising campaign aimed at defected Facebook and Instagram users, hoping the app's policies and privacy settings would assuage concerns over sharing personal information on the internet. Within the space of one week, the app went from being a small service, akin to Ello or Peach, to being the most downloaded app in eighteen countries. In December 2020, Vero released its most significant update to date, Vero 2.0 which introduced new features including voice and video calls, game and app posts and bookmarks, and refinements to the UI. In October 2021, Vero introduced their Desktop app (beta) with multiple post options and a re-sizable multi-column feed. == Concept and funding == Vero's content feed resembles Instagram's although users can share a wider variety of content and the app has a chronological content feed whereas Facebook and Instagram's feeds are algorithm based. Vero's business plan is also distinct from similar social media apps. Whereas its competitors such as Facebook or Instagram make money from in-app advertising revenue and the sale of user data, Vero's business plan was to invite the first one million users to use the app for free then charge any subsequent users a subscription fee. The app was entirely funded by its founder and generated additional revenues by charging affiliate fees when someone buys a product they find on Vero. == Awards == Vero was recognized at the 2021 Webbys, being named as an Honoree in the Best Visual Design - Aesthetic Category. == Controversies == === Privacy === Vero has faced some criticism over the wording of their manifesto, in particular, the statement "Vero only collects the data we believe is necessary to provide users with a great experience and to ensure the security of their accounts." Because this policy does not explicitly state that the app will not sell data on to third parties some users fear that the need to monetise the app through data might prove too tempting. Users have also complained about not being able to delete their accounts. While this was never the case, the option was hidden deep in the app's settings. === Russian involvement === Although Vero remains transparent about the app's Russian development team, they have been caught up in concerns about Russian interference on social media platforms. The app's founder Ayman Hariri was quick to dismiss the remarks as xenophobic and defend the nationality of his employees, stating in an interview with Time Magazine; "At the end of the day, where people are from is really not how anybody should judge anyone". === Criticism of the app's founder === Until 2013, Vero's founder Ayman Harari was deputy CEO and chairman of Saudi Oger, the Saudi Arabian construction company which collapsed in 2017, mired by controversies over the welfare and treatment of their employees. However, Hariri is quick to point out that he divested from the firm in 2014 and the worker's rights violations occurred after he had left the company.

Dark data

Dark data is data which is acquired through various computer network operations but not used in any manner to derive insights or for decision making. The ability of an organisation to collect data can exceed the throughput at which it can analyse the data. In some cases the organisation may not even be aware that the data is being collected. IBM estimate that roughly 90 percent of data generated by sensors and analog-to-digital conversions never get used. In an industrial context, dark data can include information gathered by sensors and telematics. Organizations retain dark data for a multitude of reasons, and it is estimated that most companies are only analyzing 1% of their data. Often it is stored for regulatory compliance and record keeping. Some organizations believe that dark data could be useful to them in the future, once they have acquired better analytic and business intelligence technology to process the information. Because storage is inexpensive, storing data is easy. However, storing and securing the data usually entails greater expenses (or even risk) than the potential return profit. In academic discourse, the term dark data was essentially coined by Bryan P. Heidorn. He uses it to describe research data, especially from the long tail of science (the many, small research projects), which are not or no longer available for research because they disappear in a drawer without adequate data management. Without this, the data become dark, and further reasons for this are e.g. missing metadata annotation, missing data management plans and data curators. == Analysis == The term "dark data" very often refers to data that is not amenable to computer processing. For example, a company might have a great deal of data that exists only as scanned page-images. Even the bare text in such documents is not available without something like Optical character recognition, which can vary greatly in accuracy. Even with OCR, the significance of each part of the data is unavailable. An obvious examples is whether a capitalized word is a name or not, and if so, whether it represents a person, place, organization, or even a work of art. Bibliographic and other references, data within tables (that may be labeled quite adequately for humans, but not for processing), and countless assertions represented with the full complexity and ambiguity of human language. A lot of unused data is very valuable, and would be used if it could be; but is blocked because it is in formats that are difficult to process, categorise, identify, and analyse. Often the reason that business does not use their dark data is because of the amount of resources it would take and the difficulty of having that data analysed. In other words, the data is "dark" not because it is not used, but because it cannot (feasibly or affordably) be used, given its poor representation. There are many data representations that can make data much more accessible for automation. However, a great deal of information lacks any such identification of information items or relationships; and much more loses it during "downhill" conversion such as saving to page-oriented representations, printing, scanning, or faxing. The journey back "uphill" can be costly. According to Computer Weekly, 60% of organisations believe that their own business intelligence reporting capability is "inadequate" and 65% say that they have "somewhat disorganised content management approaches". == Relevance == Useful data may become dark data after it becomes irrelevant, as it is not processed fast enough. This is called "perishable insights" in "live flowing data". For example, if the geolocation of a customer is known to a business, the business can make offer based on the location, however if this data is not processed immediately, it may be irrelevant in the future. According to IBM, about 60 percent of data loses its value immediately. == Storage == According to the New York Times, 90% of energy used by data centres is wasted. If data was not stored, energy costs could be saved. Furthermore, there are costs associated with the underutilisation of information and thus missed opportunities. According to Datamation, "the storage environments of EMEA organizations consist of 54 percent dark data, 32 percent redundant, obsolete and trivial data and 14 percent business-critical data. By 2020, this can add up to $891 billion in storage and management costs that can otherwise be avoided." The continuous storage of dark data can put an organisation at risk, especially if this data is sensitive. In the case of a breach, this can result in serious repercussions. These can be financial, legal and can seriously hurt an organisation's reputation. For example, a breach of private records of customers could result in the stealing of sensitive information, which could result in identity theft. Another example could be the breach of the company's own sensitive information, for example relating to research and development. These risks can be mitigated by assessing and auditing whether this data is useful to the organisation, employing strong encryption and security and finally, if it is determined to be discarded, then it should be discarded in a way that it becomes unretrievable. == Future == It is generally considered that as more advanced computing systems for analysis of data are built, the higher the value of dark data will be. It has been noted that "data and analytics will be the foundation of the modern industrial revolution". Of course, this includes data that is currently considered "dark data" since there are not enough resources to process it. All this data that is being collected can be used in the future to bring maximum productivity and an ability for organisations to meet consumers' demand. Technology advancements are helping to leverage this dark data affordably. Furthermore, many organisations do not realise the value of dark data right now, for example in healthcare and education organisations deal with large amounts of data that could create a significant "potential to service students and patients in the manner in which the consumer and financial services pursue their target population".

WomanStats Project

The WomanStats Project is a donor-funded research and database project housed at Brigham Young University that "seeks to collect detailed statistical data on the status of women around the world, and to connect that data with data on the security of states." The WomanStats Database aims to provide a comprehensive compilation of information on the status of women in the world. Coders comb the extant literature and conduct expert interviews to find qualitative and quantitative information on over 300 indicators of women's status in 174 countries with populations of at least 200,000. Access to the online database is free. == History and structure == WomanStats began as an outgrowth of a paper Dr. Valerie M. Hudson (of the Brigham Young University Political Science department) and one of her graduate students, Andrea den Boer, published in International Security on the association between national security and the abnormal sex ratio in Asia. After the success and influence of their first article, (later added as one of their top twenty national security articles of that journal of all time), Hudson and den Boer did further research on the connection between the status of women and national security, but found that there was no single database that covered the range of topics that they needed for their research. Consequently, they began compiling information on variables regarding the status of women around the world. The database was officially formed in 2001 and grew exponentially as it later added more variables. The Project went live on the Internet in July 2007. The principal investigators are: Valerie M. Hudson (International Relations), Bonnie Ballif-Spanvill (Psychology, emeritus), and Chad F. Emmett (Geography) all from Brigham Young University, Mary Caprioli from the University of Minnesota, Duluth (International Relations), Rose McDermott from Brown University (International Relations), Andrea Den Boer from the University of Kent at Canterbury in the United Kingdom (International Relations) and S. Matthew Stearmer from the Ohio State University (Sociology; doctoral student). Approximately a dozen undergraduate and graduate students at Brigham Young University and Texas A&M University work at any one time as coders for the project. The coders take the raw quantitative and qualitative data collected in government reports, news articles, research papers, etc. and sort the applicable information on women into categories. They may also implement scales developed by the principal investigators, or that they (the students) themselves have developed. == Database == As of February 2011, the database has 307 variables, covers 174 nations with populations over 200,000, uses 18,015 sources and contains over 111,000 individual data points. All data is referenced to original sources. Not every variable has information for each country; similarly, not all countries have information for each variable: overall, about 70% of country-variable combinations have information. These database coding gaps exist where information is not available or is incomplete, or variables are not collected and reported by governments or international organizations. At times, information from different sources may be contradictory, and the WomanStats Database records this discrepant information for triangulation purposes. == Users and role of the database == The database is meant to help fill a hole in the extant data on the situation of women around the world. WomanStats data and research has been vetted and/or used by the United Nations, the United States Department of Defense, the Central Intelligence Agency, and the World Bank. Their data and research were also used by the United States Senate Committee on Foreign Relations in crafting the International Violence Against Women’s Act. The Inter-Agency Network on Women and Gender Equality (IANWGE) of the United Nations has stated that the WomanStats project "filled a major gap in the availability of data on women" (2007). Victor Asal and Mitchell Brown, researchers not affiliated with WomanStats, stated in an article published in Politics and Policy that "one of the most significant challenges of cross-national empirical studies of the prevalence of interpersonal violence is the paucity of available data, particularly reliable data," and that "WomanStats has allowed for an important first glimpse at analyzing the factors related to interpersonal violence." They conclude by stating that "Our findings suggest that, in the same way that larger disciplinary resources have invested in interstate and intrastate war, disciplinary resources need to be expended in creating a data set exploring interpersonal violence. Until the rights and the lives of women and children are taken as seriously as the survival of states by more proactively collaborating on projects like WomanStats, we will continue to only have a small lens through which to understand problems like this." Princeton University professor Evan S. Liberman wrote, "Although data on political regimes and group conflict have been in far greater demand by political scientists than data on gender politics and policies, two gender-related databases provide...examples of innovative HIRDs. Both the Womanstats database project (Hudson et al. 2009) and the Research Network on Gender Politics and the State (RNGS) project (McBride et al. 2008) are well-integrated presentations of quantitative and qualitative data characterizing the quality of gender relations around the world and, in particular, analytic descriptions of the treatment of women."." == Research == The research component of WomanStats focuses on exploring the relationship between the situation of women and the behavior and security of states. Current research initiatives include: Exploring the relationship between violent instability and inequity and family law. Examining the effect of polygyny and marriage market dislocations on the rise of suicide terrorism. Documenting discrepancies between laws on the books and cultural practices on the ground concerning gender issues. Investigating how well the situation of women predicts the peacefulness of nations-states, compared to their variables such as democracy, wealth, and civilization. The Project has published articles in International Security, International Studies Quarterly, Peace and Conflict, Journal of Peace Research, Political Psychology, Cumberland Law Review, and World Political Review, and has a forthcoming book from Columbia University Press.