2021 Symposium

Smart State: Big Data for Community Impact

Host: Princeton University

The New Jersey Big Data Alliance (NJBDA) is an alliance of 17 higher education institutions, as well as industry and government members, that catalyzes collaboration in advanced computing and data analytics research, education and technology.

The NJBDA Annual Symposium brings together academia, government and industry from across the state and beyond, to share information on the latest innovations, research and future directions in Big Data.

Our 2021 event will showcase how our state, cities, and communities use big data to improve equity, sustainability, and prosperity for community members.​

The 2021 Symposium will include academic research sessions with presentations on current research in Smart Cities, AI, Machine Learning and Big Data.

For questions, please contact Spencer Reynolds, Princeton University, spencerr [at] princeton.edu.

Agenda:

Day 1, Thursday, April 29

Zoom link: http://bit.ly/njbda-symposium

9:00 am: Welcome

Margaret Brennan-Tonetta, President, New Jersey Big Data Alliance Andrea Goldsmith, Dean, School of Engineering and Applied Science, Princeton University Beth Noveck, Chief Innovation Officer, State of New Jersey

9:15 am: Keynote – “New Tools and New Frontiers for Community Impact through Data”

Headshot of Stephen Goldsmith

Stephen Goldsmith, Derek Bok Professor of the Practice of Urban Policy, Harvard Kennedy School

While the past year has presented many challenges for cities and communities, new technologies and data innovations have emerged in response to the multiple crises. While challenges remain ahead into 2021, opportunities abound for smarter cities in a post-Covid country.

10:00 am: Panel discussion – “Smart Data for Communities: Vision and Implementation”

Moderator: E. Steven Emanuel, CGI Consulting, Former CIO, State of New Jersey, Former CIO, City of Newark, NJ

Panelists: Bernadette Kucharczuk, Jersey City; Tim Moreland, Chattanooga; Ruthbea Yesner, IDC

Municipalities and states are leveraging data for community impact in myriad ways, often within a vision or framework providing context and priorities. This session will explore several municipal cases, and how the vision of data-enabled government co-evolves with the implementation, to address opportunities and challenges.

11:00 am: Workshops (two concurrent tracks)

Smart Cities practitioners track

Stay in original Zoom link for Track 1: http://bit.ly/njbda-symposium

Big Data Workshop: Using COVID Data Intelligence Programs to make critical decisions in NJ cities and counties

Workshop leaders: George Avirappattu, Kean University; Navin Vembar, Camber; Margaret Piliere, Madhu Chandran Sreekumuran Nair and E. Steven Emanuel, CGI.

This workshop will exemplify academic-industry partnerships. It will focus on the use of data intelligence in providing states, counties, and cities with critical tools during disasters and will be run roughly in two parts. First, demonstrate the use of data intelligence in helping counties and municipalities to model morbidity, mortality, hospitalization and ICU bed rates, transience, vaccinations and school openings/ closings. Second we will look at socioeconomic, structural, and environmental factors that explain the varying impact of COVID-19 to different communities separated by location. We will inspect how these factors have common threads that run across communities and make some more vulnerable than others to various natural disasters including COVID-19.

Student Projects track

Zoom link for Track 2: http://bit.ly/njbda-track2

Experiential Learning thru Capstones: Opportunities and Challenges

Workshop leaders: Rashmi Jain, Montclair State University; Adam Spunberg, AB InBev

The workshop will highlight the partnership between universities and industry necessary to provide experiential opportunities to the students in the form of Capstone courses. Capstone courses are an integral part of the culminating experience of academic programs. These are immensely helpful opportunities for the students to prepare for careers in the industry. Experiential learning is learning by doing. Doing high-impact learning practices requires substantial student participation and effective faculty advising. Employers and hiring managers value college candidates with experiential learning across individual disciplines in real or very like real-world settings. An approach that has worked effectively is to involve the industry stakeholders in defining the problem scope and working closely with them for the accomplishment of the students’ deliverables.

The workshop will cover research on experiential learning and how its unique characteristics lends itself for capstone courses in business. We will share experiences of successful partnerships, what works and what does not, issues and challenges, and lessons learned. The role and responsibilities of the students in deriving the best out of such experiences will also be covered.

Student and Faculty Poster presentations (all day)

 

More information can be found on the Student Research page and the communication server on Discord. These presentations will be available asynchronously.

Matthew Schofield (presenter), Shen-Shyang Ho and Ning Wang, Rowan University

In shared mobility systems research, there is increasing interest in the use of reinforcement learning (RL) techniques for improving the resource supply balance and service level of systems. The goal of these techniques is to effectively produce a user incentivization policy scheme to encourage users of a shared mobility system (e.g. bikeshare systems) to slightly alter their travel behavior in exchange for a small monetary incentive. These slight changes in user behavior are intended to over time increase the service level of the shared mobility system and improve its profit margin. Reinforcement learning techniques are gaining popularity as an approach to solve this problem, as they can utilize deep learning for tasks that require many actions and produce a cumulative noisy reward signal. A reinforcement learning policy can be used to provide many incentives to users and then receive the service level of the target mobility system over time as a reward signal. We present an analysis and results of our extensive study on the effects of different frameworks for representing a shared mobility system on reinforcement learning performance for user incentivization, in terms of service level. We utilize bikeshare trip-data from Washington D.C.’s Capital Bikeshare system between 2015 and 2019 in our experiments to produce data-driven simulations for experimentation. In analysis, we show the relationship and effects on service level of user volume / mobility needs, resource supply availability, and incentivization budget. Further, we analyze the effectiveness of various reinforcement algorithms and various framework approaches.

Mehmet Turkoz (presenter) and Rajiv Kashyap, William Paterson University

Support Vector Data Description (SVDD) is a support vector-based learning algorithm used to detect anomalies. SVDD obtains a spherically shaped boundary around the target data by transforming the target data into high-dimensional space. SVDD identifies the boundary by checking whether the data point is placed inside or outside of the boundary in the transformed space by using kernel distance. Although using kernel distance improves the performance of SVDD, usage of only kernel distance does not improve the power of SVDD especially with the complex data. To overcome this situation, considering density distribution is one of the well-known techniques. In most of the real-life applications, each data point has different importance based on the density of the data. The data points placed in dense area are considered more important than the data points placed in less dense area. Thus, utilizing each data point equivalently without considering density, obtained boundary will be limited to describe the data. Therefore, to increase the efficiency of traditional SVDD, we propose a new SVDD procedure which considers density of a data set by assigning a weight for each data point. The effectiveness of the proposed procedure is demonstrated with various simulation studies and real-life datasets.

Research Proposals:

Samantha Nievas, Kean University; Nick Marshall, Joseph Kajon, Seton Hall University

The impact of COVID-19, measured in terms of positive case and mortality rates per 100,000, is known to be greater in communities with greater social vulnerability. Our research uses GIS mapping and statistical analysis to examine this relationship in Chicago, IL. The results suggest that while there may be a strong association in earlier stages of epidemics, the association is weaker as the pandemic continues, indicating that even those with relatively low social vulnerability are eventually impacted by mass pandemics.

Trevor Carr, Xiang Li, Jennifer Simon, Rutgers University

Since the first block was mined in the year 2009, Bitcoin has emerged as the world’s standard Cryptocurrency, trading at prices fluctuating in the mere hundreds to prices over sixty thousand USD (Hicks, 2020). This tremendous growth in valuation has also spurred many questions surrounding the predictability and “decentralized” nature of Bitcoin. While advertised as a decentralized alternative to traditional monetary policy, the exact extent of this detachment with global monetary policy appears to be quite flimsy. Previous research has identified associations between both global monetary policy and business cycle fluctuations on the price volatility Bitcoin has seen in its short trading life (Corbet et al., 2017). Specifically, Bitcoin’s valuation is heavily impacted by global quantitative easing (QE) measures, centralized interest rate policy, business cycle fluctuations, price stability, supply and demand, and other market driving cryptocurrencies. Despite having these identifiable indicators, many investors remain skeptical about the stability this asset has to offer. This skepticism is primarily rooted in the lack of research and verifiable information surrounding this incredibly young groundbreaking asset.

Through the use of a Support Vector Regression supervised learning algorithm, the previously stated economic features will be quantifiably related and analyzed to perform advanced Bitcoin regression forecasting. As Bitcoin continues to become a more widely accepted and traded commodity, greater information regarding its volatility and decentralized nature is imperative to supporting its growth as an investment opportunity and currency alternative. Additional information regarding the impact of modern monetary policy on fostering novel markets, such as the cryptocurrency market, may also be uncovered in this study. It is the intention of this research to forecast Bitcoin’s pricing trajectory and uncover the macroeconomic links that influence this exciting new commodity.

Yeonho Choi, Stevens Institute of Technology

The study is focused on the development of big data platform. It particularly studied the platform for urban plastic waste disposal and recycling. There are lots of plastic usage and disposal in the most big cities in the world. However, there are less study about its disposal and recycling. This study investigates the current status of plastic management and study the way to develop its platform.

12:30 pm: End of Day 1

Day 2, Friday, April 30

Zoom link: http://bit.ly/njbda-symposium

9:00 am: Opening and recap of Day 1

Margaret Brennan-Tonetta, Executive Director, New Jersey Big Data Alliance

Piyushimita (Vonu) Thakuriah, Dean, Bloustein School of Planning and Public Policy, Rutgers University

9:15 am: Keynote – “Power to the Public: The Promise of Public Interest Technology”

Headshot of Tara Dawson McGuinness

Tara Dawson McGuinness, Fellow and Senior Adviser of the New Practice Lab, New America

The events of the past year have demonstrated the important role that data and technology play in everything from understanding the spread of a global pandemic to tracking how well governments are doing at reaching people with services from unemployment insurance and stimulus checks to vaccine appointments.  This presentation will build on the ideas in Tara McGuinness and Hana Schank’s new book: Power to the Public: the Promise of Public Interest Technology making the case that governments and nonprofits need new ways and data tools to tackle the complexities of our time and really deliver for the public in an equitable way.

10:00 am: Panel discussion – “Smart Data to Illuminate Community Grand Challenges”

Moderator: Piyushimita (Vonu) Thakuriah, Dean, Bloustein School of Planning and Public Policy, Rutgers University.

Panelists: Will Payne, Rutgers University; Radha Jagannathan, Rutgers University; Carl Gershenson, Princeton University

What are the grand challenges of urban communities today, and what are the key dynamics within these challenges? This session will explore the innovative ways that university research faculty are using big data to understand these challenges, and point the way to effective and efficient solutions.

11:00 am: Academic Research Tracks (two concurrent tracks)

Smart Cities (Room 1)

Stay in original Zoom link for Room 1: http://bit.ly/njbda-symposium

Moderated by Forough Ghahramani, Associate Vice President, Edge

Haodi Jiang (presenter), Jason T. L. Wang, Ohad Ben-Shahar, Jihad El-Sana and Haimin Wang; New Jersey Institute of Technology

Space weather is a term used to describe changing environmental conditions in the solar system caused by eruptions on the Sun’s surface such as solar flares. Understanding and forecasting of solar eruptions is critically important for national security and for the economy since they are known to have adverse effects on critical technology infrastructure such as satellites and power distribution networks. Space weather analytics is an emerging interdisciplinary field, which aims to (i) understand the onset of solar eruptions and assess space weather effects on Earth through big solar and space data analysis, and (ii) perform near real-time long-range predictions of extreme space weather events including solar flares, coronal mass ejections (CMEs) and solar energetic particles (SEPs) as well as solar wind and geomagnetic storms by using advanced artificial intelligence (AI) techniques.

Here we present a big data-enabled, AI-powered, community-driven cyberinfrastructure for performing space weather analytics. There are three interrelated tasks: (i) identifying, detecting, tracking, and extracting patterns in solar and space data; (ii) synthesizing artificial solar images for studying solar activity in multiple solar cycles; and (iii) predicting solar eruptions and space weather events. We describe a database and tools we are developing for accomplishing these tasks.

Dylan Perry (presenter), Ning Wang and Shen-Shyang Ho, Rowan University

According to Statista’s prediction, the number of Internet of Things (IoT) devices by the year 2025 will be more than 75 billion.  A huge amount of sensory data is generated by these devices every day. If this data can be used effectively, existing systems performance can be improved. However, it is non-trivial to achieve such a goal due to limited data on a single IoT device and non-independent, identically distributed data generated from multiple IoT devices. To study this issue, several federated energy demand prediction methods were tested on an Electric Vehicle (EV) charging station network and compared with baselines.

Performances across methods for machine learning models were compared in order to showcase the increase in accuracy that the proposed method provides. The results that were gathered show a reduction in prediction error over other state of the art model structures, i.e., LSTM. Then, a time-domain clustering algorithm is applied to break up the region of charging stations into subsections and group them through usage pattern similarity. The charging stations are divided into sections that allow for the greatest increase in accuracy over being trained individually. Furthermore, the proposed model aggregation method allows for models being trained on local station groups to be combined for an even greater result compared with dataset aggregation.

The results that will be presented go on to imply that with the proposed machine learning model structure, charging stations can supply the energy needed for the vehicles to use ahead of time with less error.

Ning Wang (presenter) and Jie Li, Rowan University

Public transportation contributes to a big component of transportation sector emissions. Replacing existing buses with electric buses (E-Buses) is regarded as one of the major contributors in reducing petroleum use, meeting air quality standards, improving public health, and achieving greenhouse gas emissions reduction goals. Although promising, the charging of E-Buses has an impact on the electric distribution system because they consume a large amount of electrical energy, and this demand of electrical power can lead to extra large and undesirable peaks in the electrical consumption.  To address the impact of the reliable operation of the electric distribution system introduced by E-Buses, E-Buses charging regulation has to be implemented. The objective of this abstract is to explore a Machine Learning based scheduling scheme – to coordinately manage the electric distribution system demand by flattening the energy consumption curve and minimizing the electric energy cost. In the meantime, the E-Bus’s charging needs and the building occupant’s energy needs and comfort levels will be guaranteed. The coordinated scheduling scheme will leverage the grid-interactive efficient buildings (GEBs)’ energy management and E-Bus charging flexibilities. Particularly, realistic E-Bus charging requirements such as E-Bus departure state-of-charge (SoC), GEB’s scheduling flexibility, utility electricity rate, etc.

Michael Bell, New Jersey City University

In contrast to “smart cities” or “sustainable cities”, only recently has attention focused on scholarship relevant to smart sustainable cities, and even less so on the evolution of law and policy within that framework.  Smart sustainable cities are inclusive cities.  The goal of inclusivity is not only critical to the concerns of equity and fairness, but is key to the development of resilient law and policy.  Viewing the city as a socio-ecological system, the smart sustainable city is seen as an urban planning strategy within that system. Ultimately, the goal is to achieve a balanced socio-ecological system.  Greater urbanization can lead to imbalance, caused by environment harms, social injustice, social hazards, etc.  In classic planning theory, public action is justified by identification of public norms, which are frequently rearticulated, giving content and legitimacy to law and policy.  The ubiquitous use of new information and communications technology (ICT) and urban computing innovations is a public action which also should be justified – based on contribution to environmental and socio-economic needs and concerns as perceived by citizens.  Yet, the smart city agenda masks a possible bias:  local governments outsource policy by having to rely on various third parties to deploy data-analytics. Utilization of these algorithms may not reflect the actual priorities of citizens. However, when local government is inclusive and engages with diverse populations, greater contextualization of law and policy can accommodate sustainable urban development.

Deep Patel (presenter), Mohammad Jalayer, Abdelkader Souissi, Ghulam Rasool and Nidhal Carla Bouaynaya; Rowan University

In recent years, identifying road users’ behavior and conflicts at intersections has become an essential data source for evaluating traffic safety. This study developed an innovative artificial intelligence(AI)-based video analytic tool to assess intersection safety using surrogate safety measures. Surrogate safety measures (e.g., Post-encroachment Time and Time to Collision) are extensively used to identify future threats, such as rear-end collision due to vehicle and road users’ interactions. To extract the trajectory data, the proposed work integrates a real-time AI detection algorithm, YOLO-V5, with tracking using Deep SORT algorithm. 30-minutes of high-resolution video data were collected from a busy signalized intersection in Morristown, New Jersey. Non-compliance behaviors, such as red-light running and pedestrian jaywalking, are captured to better understand the risky behaviors at intersections. The proposed approach achieved an accuracy between 92% and 97% in detecting and tracking the road users’ trajectories. Also, results demonstrated that the developed tool provides valuable information for engineers and policymakers to develop and implement effective countermeasures to enhance intersection safety.

General AI/ML (Room 2)

Zoom link for Room 2: http://bit.ly/njbda-track2

Moderated by Hieu Nguyen, Professor, Rowan University

Oliver Alvarado Rodriguez (presenter), Zhihui Du and David Bader; New Jersey Institute of Technology

Exploratory graph analytics is a much sought out approach to help extract useful information from graphs. One of its main challenges arises when the size of the graph expands outside of the memory capacity that a typical computer can handle. Solutions must then be developed to allow data scientists to efficiently handle and analyze large graphs in a short period of time, using machines that have the capacity to handle massive file sizes. Arkouda is a software package under early development created with the intent to bridge the gap between massive parallel computations and data scientists wishing to perform exploratory data analysis (EDA). The communication system between the Chapel back-end and the Python front-end helps to create an easy-to-use interface for data scientists that does not require knowledge of the underlying Chapel code and instead allows them to utilize the simple Python front-end to carry out all their large file and graph EDA needs. In this work, a graph data structure is designed and implemented into the Arkouda framework at both the Chapel back-end and the Python front-end. The main attraction of this data structure is its ability to occupy less memory space and perform efficient adjacency edge searching. A parallel breadth-first search (BFS) algorithm is also presented to help demonstrate how easily one can implement parallel algorithms in Arkouda to increase EDA productivity with graphs. Lastly, real-world graphs from different domains, such as biology and social networks, are utilized to evaluate the efficiency of the graph data structure and the BFS algorithm. The results obtained from this benchmarking help show that the Arkouda overhead is almost negligible, and data scientists can utilize Arkouda for large scale graph analytics. This work can help further bridge the gap between high-performance computing (HPC) software and data science to create a framework that is straightforward for all data scientists to use. All of the code in this project and in Arkouda is open source and can be found on GitHub. This is joint work with Mike Merrill and William Reus. We acknowledge the support of National Science Foundation grant award CCF- 2109988.

Shamoon Siddiqui (presenter), Ghulam Rasool and Ravi Ramachandran, Rowan University

 

Natural language processing is a broad field that encompasses several sub-tasks. One problem that has gained visibility over the past several years is that of Sentiment Analysis. This is the process of determining the attitude of an author towards some subject across some spectrum, typically “positive” or “negative,” by analyzing the textual information. Whereas the field started with simple counting of words with certain characteristics, it has grown in complexity with the advent of deep learning and neural network based language models. Typically, datasets used to train and evaluate these models consist of text with appropriate labels, such as movie reviews with an accompanied star rating. However, the applicability of those results to other scenarios, such as unstructured or natural text has not been clear. In this paper, we demonstrate a clear and simple case that shows that the problem of sentiment analysis is fundamentally unsuitable for natural text. We consider state-of-the-art black box models developed and hosted by 3 of the largest companies in this field: Amazon, Google and IBM.

Bahar Ashnai (presenter) and Saeed Shekari, William Paterson University

We investigate the key fundraising strategies and competencies that enable the resources of nonprofit organizations to utilize their full potential and achieve desirable fundraising outcomes. We explore the fundraising strategies that empower the critical resources to achieve higher performance within each strategy. We acknowledge the parity of sales and fundraising, in line with the anecdotal evidence attesting such a resemblance, “ending the stigma: fundraising is sales.” We draw upon the science of business-to-business sales and business-to-consumer marketing to develop a strategic nonprofit fundraising framework. Key fundraising resources include dedicated fundraising staff, brand equity, and cultivated external relationships. Staff and resource scarcity is a major challenge for nonprofit organizations. We propose a nonprofit can follow two major fundraising strategies to promote fundraising performance. These two grand strategies are increasing the number of donors or increasing the donation dollar amount. We argue that at any point of time, given limited resources, the focal nonprofit had better opt for one of these strategies. Choosing both strategies at the same time spreads fundraising resources too thin. We use empirical data to test our suggested framework and the underlying hypotheses. There are over one million entities registered as nonprofit organizations in the US. Nonprofit tax returns, and they constitute the Big Data that we are interested in. The subsample includes longitudinal data of 10,000 nonprofit organizations over ten years (2009-2019), resulting in 100,000 nonprofit-year data lines. Each nonprofit-year reports 350 data fields crating a database of 35,000,000 nonprofit-year-field.

Alexsis Wintour (presenter), Sophie Stalla-Bourdillon and Laura Carmichael; Lapin Limited

Independent data stewardship remains a core component of good data governance practice. Yet, there is a need for more robust independent data stewardship models that are able to oversee data-driven, multi-party data sharing, usage and re-usage, which can better incorporate citizen representation, especially in relation to personal data. 

We propose that data foundations – inspired by Channel Islands’ foundations laws – provide a workable model for good data governance not only in the Channel Islands but also elsewhere. These offer a robust workable model for data governance in practice, as they provide: a comprehensive rulebook; a strong, independent governance body; an inclusive decision-making body; a flexible membership; a trust-enhancing technical and organisational infrastructure; and a well-regulated structure. 

We outline eight universal design principles to unite all data foundations: (a) all data are relevant, (b) data stewards are independent, (c) expected standards of good practice for data governance specified by a code of conduct, (d) self-regulation, (e) monitoring is the heartbeat, (f) sustainability, (g) accreditation stimulates market growth, and (h) stakeholder approvals need to be maintained.  

There is an opportunity to advance the wider data institution movement through a legal structure that is ready for use and well-suited to the needs of data sharing initiatives, in particular, since data foundations incorporate the vital element of independent data steward through the statutory role of the guardian.2

The principal purpose for this paper is to demonstrate why data foundations are well suited to the needs of data sharing initiatives and examine how they could be established in practice.

Firas Gerges (presenter), Michel Boufadel, New Jersey Institute of Technology; Hani Nassif, Rutgers University

Environmental impacts of climate change are more observable today, with the increased rate and severity of hurricanes and droughts, and the continuous rise of sea level. To face these challenges, resilience concepts have emerged as a way to enhance communities’ preparedness and capacity to absorb disasters. There are indices for community resilience in general, but they are only relative, based on comparison between entities, and they do not account for the stress level on resilience. In this work, we developed a new approach to quantify the absolute level of resilience for each of the critical community sectors, and subsequently the community overall. Our approach aims to leverage the growth of big data by using records compiled from public sources (datasets, GIS layers, etc.) to capture the Community Intrinsic Resilience Index (CIRI) in a GIS-based web platform. This platform would advance the efforts to fill the gap between resilience research and applications and would enable practitioners to integrate resilience within the planning and design phases of disaster management. We applied the approach to New Jersey counties, and we found that CIRI ranged from 63% to 80%. A post-disaster CIRI (following a scenario of flooding) revealed that two coastal counties would have low resilience due to the reduction of the road area and/or the reduction of the GDP (local economy shut down) to below minimum values.

Student Poster presentations (all day)

 

See Abstracts above in Day 1. More information can be found on the Student Research page and the communication server on Discord. These presentations will be available asynchronously.

12:30 pm: End of Day 2

Thank you for attending the 8th Annual New Jersey Big Data Alliance! Follow the NJBDA on Twitter (@njbda) and LinkedIn.