Tuesday May 9, 2023

10th Annual NJBDA Symposium

Big Data in FinTech

Host: Seton Hall University

The New Jersey Big Data Alliance (NJBDA) is an alliance of 18 higher education institutions, as well as industry and government members, that catalyzes collaboration in advanced computing and data analytics research, education and technology.

The NJBDA’s annual symposium is New Jersey’s premier conference for big data & advanced computing, consistently attracting 200+attendees from industry, government, and academia.

This year’s symposium will be held in person at Seton Hall University. Read Seton Hall’s press release.

Seton Hall University is located at 400 South Orange Ave., South Orange, New Jersey (Google map: https://goo.gl/maps/qoTCgk1kV9GENnZL6). South Orange can be reached by car and NJ Transit train and bus.

The conference venue is the University Center (Building 12 on the campus map: https://www.shu.edu/visit/upload/Campus_Map.pdf). Parking will be available on campus as directed at the gate.

For questions, please contact Manfred Minimair, Ph.D., Seton Hall. For sponsorship inquiries, please contact Kaerielle Larsen, Seton Hall, at kaerielle.larsen@shu.edu.

BUY TICKETS NOW

Keynotes

Kjersten Margaret Moody

Chief Data Officer, Prudential Financial

Get Ready to Create the Future

You’ve experienced the latest advancements in data and analytics. You’ve read the headlines about quantum computing, blockchain, generative AI, and more. But what comes next, and what do you need to do to prepare?

By attending this session led by Kjersten Moody, Prudential Financial’s Chief Data Officer, you’ll gain a deeper understanding of how your technical knowledge, leadership skills, and real-world experience can be the catalyst to prepare you for the transformation ahead in your careers and to succeed in driving business success and creating growth opportunities.

George Calhoun

Professor and Founding Director of the Quantitative Finance Program, Stevens Institute of Technology

New Jersey has all the elements to build a Fintech version of “Silicon Valley” – except a robust venture development eco-system. The State is currently ranked 10^th in Fintech company formation. This could put incumbents at risk of significant disruption from new tech-enabled players from outside the traditional financial services industry (as has already happened in some segments, such as the exchange sector). The question is: What has to happen for the State to become a world center of Fintech innovation, entrepreneurship and job creation?

Marc Rind

Chief Technology Officer-Data, Fiserv

Agenda

8:30 – 9:30

Registration, Refreshments, Exhibits

Location: Event Lounge and Event Room

9:30 – 10:00

Welcome

Matt Hale, President, NJBDA; Associate Professor, Seton Hall University
Manfred Minimair, NJBDA Symposium Chair; Professor, Seton Hall University
Anthony Loviscek, Professor, Seton Hall University, Academy of Applied Analytics and Technology
Katia Passerini, Provost and Executive Vice President, Seton Hall University

Location: Event Room

10:00 – 10:45

Keynote 1: Kjersten Moody

Chief Data Officer of Prudential Financial

(30 minutes speech, 15 min Q&A)

Location: Event Room

10:45 – 11:00

Break, Exhibits (incl. Student Posters), Networking

The 2023 Symposium will include a student research poster session on the topic of Big Data in FinTech. Students from NJBDA member institutions and others are welcome to submit a poster. Topics related to the focus of the symposium theme include, but are not limited to, Large Language Models, AI, Finance, Blockchain, Crypto & Digital Currencies, NFTs, and Big Data Business Intelligence are also invited. Please submit your posters by April 14^th, 2023, with a separate cover page, including your name, email, department, institutional affiliation, and title of the poster, to https://sites.google.com/stockton.edu/2023-njbda-symposium. Bring a 36” X48” printed copy of the poster to the event on May 9^th, 2023.

Location: Event Lounge

George Avirappattu, Associate Professor, Kean University (Organizer)
Demetrios Roubos, Information Security Officer, Stockton University (Organizer)

11:00 – 12:00

Parallel Sessions:

AI/Machine Learning Applications

The need to adapt to newer technologies and cater to a wide customer base with customized needs has become the need of the hour, with companies constantly innovating. Successful companies have used AI-ML to design products suiting their customer’s evolving needs. Over the recent years, machine learning has had a major impact in the finance lending sector by allowing for more accurate and faster decision-making through analysis of consumer data, usage trends, and patterns.

This session will focus on the challenges and benefits of using such disruptive technologies of AI and ML for better BI on customers, more informed decision making, and improved risk management, and lower costs.

Rashmi Jain, Professor, Montclair State University (Moderator)
Bhupinder Bhullar, Managing Director and Co-Founder, Swiss Vault
Jason Cooper, Chief Technology Officer, Paradigm
Amitabh Patil, Co-Founder and CTO, Whiz AI

Location: Chancellor’s Suite

Data Assets and Privacy

The workshop considers data as an asset and examines implications for technological, legal and privacy frameworks. A case study from the Island of Jersey will be presented, generalizing the legal framework of trusts in Jersey to data assets. Data assets considered in the case study are personal activity data derived from cyclists on the island. The trusts serve as tools to preserve data ownership and privacy. The second in-depth study covers the use of data from electronic health care records. Such data is used for machine learning applications and statistical inference to support medical trials and to assess vulnerabilities of the data. It will be discussed how such applications can be compliant with privacy laws and will not compromise patient identity.

David Opderbeck, Professor, Seton Hall University (Moderator)
Tony Moretta, CEO, Digital Jersey
Choudur Lakshminarayan, Teaching Professor, Stevens Institute of Technology

Location: Meeting Room 206

Entrepreneurship in FinTech

The fintech space is one of the fastest growing areas of entrepreneurship, with new startups popping up every day. In addition, big data has had a significant impact on the fintech industry revolutionizing the way financial services are delivered and changing the competitive landscape. Panelists will discuss the how they are using big data, the current market landscape, opportunities for disruption and the challenges that come from operating in a regulated environment. Given the recent turmoil in the banking industry, the panelists will also provide insights and perspectives on the current state of funding in fintech.

Judith Sheft, Executive Director, New Jersey Commission on Science, Innovation and Technology (Moderator)
Nancy Schneier, Chief Revenue Officer and Co-Founder, Vikar Technologies
Yingchao Zhang, Head of Global Solutioning, AYR.ai
Gabriel L Pauliuc, Chief Data and Technology Officer, HearRWorld, LLC
Chisa Egbelu, Co-Founder and CEO, PeduL

Location: Event Room

12:00 – 1:00

Lunch, Exhibits (incl. Student Posters), Networking

Location: Event Lounge and Event Room

1:00 – 2:00

Parallel Sessions:

Workforce Development for FinTech

Talent and human capital are now, and will continue to be, the critical factors of production in the data-driven economy. Technology and lifelong training to maintain the appropriate skills become pre-conditions for participating in this new landscape. By educating, training, and facilitating access to individuals with advanced computing and analytics skill sets, New Jersey can provide a competitive advantage for its employers. This workshop will highlight the specific skills and workforce development needs of the fintech sector to ensure the successful growth of this critical sector for the state economy.

William Noonan, Chief Business Development Officer, Choose New Jersey
Mark Guthner, Professor of Financial Practice, Rutgers Business School
Steven Hunter, Director, AI Center of Excellence and co-organizer of the UBS Pitch Competition, UBS
Peggy Brennan-Tonetta, Director, Resource and Economic Development, New Jersey Agricultural Experiment Station, Rutgers, the State University of New Jersey (Moderator)

Location: Event Room

Cryptocurrencies and Risk Management

The panel will cover investigatory and criminal aspects of cryptocurrencies, risk management from the corporate perspective, and relevant underlying technologies such as blockchain. The discussion will explore how blockchain technology facilitates the identification of certain risks by providing positional information about counterparties, while also creating new risks due to the transparency of market participants’ information. AI-powered fraud detection systems will be discussed, with a focus on their ability to analyze large amounts of transaction data in real time to identify suspicious patterns or anomalies that may indicate fraudulent activity. By using machine learning algorithms, these systems can learn over time to identify increasingly sophisticated forms of fraud. Topics will include prevalent cyberattacks targeting financial institutions as well as the evolving risk landscape in the financial sector. The discussion will also delve into security risks and challenges associated with the ongoing digital transformation and the emergence of open and decentralized banking models.

Demetrios Roubos, Information Security Officer, Stockton University (Moderator)
Petter Kolm, Professor, NYU Courant
Mert Saglam, Special Agent, US Secret Service
Rodney Sunada-Wong, Risk Manager

Location: Chancellor’s Suite

Research Presentations in FinTech

The symposium Research Track features presentations on current applied research in Big Data, AI, and Machine Learning. Presentations are categorized based on specific symposium themes in Big Data and FinTech.

Forough Ghahramani, Associate Vice President for Research, Innovation, and Sponsored Programs, Edge and NJBDA Vice President for Research and Collaboration Committee (Research Track Chair)

AI/Big Data Session

Session Moderator: Abhishek Tripathi, PhD, The College of New Jersey

Estimating Blueberry Crop Yield using Deep Learning and Intelligent Drones

Title: Estimating Blueberry Crop Yield using Deep Learning and Intelligent Drones

Authors:

Brandon McHenry, PhD, Rowan University
Hieu Nguyen, PhD
Thanh Nguyen, PhD

Keywords: Artificial Intelligence, AI, Machine Learning, Deep Learning, Precision Agriculture

Abstract: More than ever, precision agriculture has become an important part of increasing the efficiency of farmers by helping them to not only more effectively use crop inputs like fertilizers, pesticides, and irrigation water, but also to predict crop yield early in the growing season. The latter helps farmers to better negotiate with brokers and retailers and to hire sufficient field workers during harvest. In this presentation, we present new deep-learning models that can be implemented in autonomous aerial drones to help blueberry farmers efficiently and accurately estimate their crop yield during the green fruit stage. Using computer vision, such drones will be able to intelligently assess their surroundings, plan out their flight path over a designated field, and execute their mission of capturing images of a random sample of bushes (from both sides) in order to estimate crop yield.

Our models include three deep learning models used for top-view bush detection, side-view bush detection, and individual berry detection, all based on the YOLOv5 object detection algorithm. We also have an image-processing model for calculating the row direction of a blueberry field. The models were all trained on images of blueberry bushes taken from local farms in South Jersey and consisted of those taken on the ground by a hand-held camera and those captured by an aerial drone.We first use our row detection model to plan out the flight trajectories for the drone. The drone then collects samples of bushes when it is flying between the rows. With our bush detection model, the drone can position itself to capture images of selected bushes. Post-mission, we use our berry count model to then detect and count berries on the bushes. In addition, we use our top-view bush detection model in order to estimate the total number of bushes in a given field, and thus obtain an estimate of the crop yield.

To assess the performance of our berry count model, which is heavily affected by occlusion, we compare predictions from the model against two ground truths. The first is based on what we annotated for a given image (i.e., what we can see), and the second is based on the actual count after manually picking each berry (i.e., what is actually on the bush). Furthermore, we explore the differences in performance between our berry counting model trained using images taken purely from drone cameras and images taken purely from hand-held cameras. This distinction is made as the drone and hand-held datasets are drastically different in resolution (the former is of much lower resolution than the latter), leading to drastically different training metrics and predictions for our berry counting model. We compare results for these two distinct datasets and explore whether or not a model trained on drone-captured photos, hand-held camera photos, or a mixture of both is the most accurate. From experimental results, we observe that our model was able to detect most of the blueberries when it was trained on either drone-captured photos (precision of 63%, recall of 50%) or a combination of drone and hand-held camera photos (precision of 65%, recall of 51%), with the latter achieving the best performance.

Detection of Distracted Driving using Deep Learning Algorithm

Authors:

Ahmed Sajid Hasan, PhD, Rowan University
Deep Patel, PhD
Mohammad Jalayer, PhD

Keywords: Distracted Driving, Artificial Intelligence, Deep Learning, New Jersey

Abstract: Thousands of people die every year in the United States due to distracted driving crashes, with distracted driving accounting for 25% of all fatal traffic crashes in New Jersey. The transportation safety community implemented various AI techniques to detect distracted driving events inside the vehicle. However, most of those approaches are overt, where the subjects are aware about being recorded. To close this gap, this study collected video data on distracted driving events from outside the vehicle in the state of New Jersey. The method involved a data collection crew continuously driving through the selected corridors to track driver distraction events by video recording using high-resolution cameras. To analyze the data, drivers behavior were classified into 9 various categories: non-distracted, tinted/not visible, fidgeting/grooming, radio/reaching objects, drowsy, eating/drinking/smoking, receiving calls, and handheld cellphone. The recorded videos were preprocessed, and more than 26,000 unique images were annotated using Labellimg software. The annotated images are trained and tested on YOLO-V5, an artificial intelligence (AI) model to detect the driver’s distraction. The suggested model performed reasonably well in predicting distracted driving events, with an accuracy of 90.9%. It is expected that the results obtained from this study will further assist state and local agencies in strengthening law enforcement in New Jersey by better detecting distracted driving behaviors.

Multi-Class Weighted Bayesian Support Vector Data Description to Resolve Algorithmic Bias

Authors:

Mehmet Turkoz, PhD, William Patterson University
Rajiv Kashyap, PhD

Keywords: Support Vector Data Description, Bayesian, Artificial Intelligence

Abstract: As the country heads towards a recession, potential minority homeowners face even greater challenges than before. In addition to rising interest rates that make it more difficult to buy a home, minorities are being priced out of the housing market by Wall Street firms that seek to profit from foreclosed homes and dips in housing prices during the pandemic. While the problem of minority discrimination has been well established (but not acknowledged by the American Banker’s Association), disparities in the cost of living across various metro areas has not been factored into previous analyses. This constitutes another source of variation that can confound any analysis of differences in mortgage rates and loan denials to different racial groups. We develop a novel solution to this problem by using a modified Bayesian Support Vector Data Description (SVDD) approach that can serve as an input to an AI algorithm for detecting anomalies such as higher mortgage interest rates for minorities and disadvantaged populations. Bayesian approach used in this research increases the efficiency of traditional SVDD, by considering the density of a data set to assign a weight to each data point and obtain the boundaries of the hypersphere.

An Analysis of the Effectiveness of Machine Translations for Multilingual Natural Language Processing Methods

Authors:

Rick Anderson, PhD, Rutgers University, The State University of New Jersey
Jim Samuel, PhD
Carmela Scala, PhD
Parth Jain, PhD

Keywords: Machine Translation, Natural Language Processing, Finance, AI, ML, Sentiment Analysis, language model, Transformers

Abstract: Natural language processing (NLP) is widely used for a variety of value creation tasks ranging from chat-bots and machine translations to sentiment and topic analysis. However, most of the advances are implemented within the English language framework and it will take time and resources to develop comparable resources in other languages. Advances in machine translations have enabled rapid and effective conversion of content in global languages into English and vice-versa. This creates potential opportunities to apply English language NLP methods and tools to other languages via machine translations. However, although this idea is powerful, it needs to be validated and processes and best practices need to be developed and kept updated. It is therefore important to study the behavior of textual data (Samuel et al., 2022).

NLP as a domain has been experiencing unprecedented breakthroughs and an exponential adoption growth rate by businesses, institutions, governments and individual users. The global NLP market is expected to to grow to nearly fifty billion US dollars by 2027, and there have been many notable developments in NLP since late 2021 (NLP, 2022): Apple will provide an open-source reference PyTorch implementation of the Transformer architecture for its products, enabling global developers to effortlessly run Transformer models. Baidu, at the end of 2021, introduced PCL-BAIDU Wenxin (ERNIE 3.0 Titan), which is a state-of-the-art knowledge-enhanced 260 billion parameters model for the Chinese language. This model outperformed its predecessors easily. Most recently, we have seen the release-announcements for #Meta’s (#Facebook) LLaMA, #Alphabet’s (#Google) PaLM and #OpenAI’s GPT4, with vision and language based multimodal capabilities (LLaMA, 2023: PaLM, 2023; GPT4, 2023). This has been accompanied by a rush of amazing AI research at a dizzying pace.

Google’s 2021 MUM language model was trained across 75 languages, and is an exception to the general single-language language models. Google’s VP of Search declared that MUM as “1,000 times more powerful than BERT” and that it has “…the potential to transform how Google helps [users] with complex tasks.” (Pandu Nayak, 2021). It would be expensive and unfeasible from a resource availability and allocation perspective to expect such models to be built and kept updated for every language in the short term. The key question therefore is: Given the NLP advances in one language, can we extend the benefits to non-native language by machine-translating the non-native language into the native-language with advanced NLP models? To address this, we perform an experiment with a lab developed Italian text corpus.

Our research in machine translation based NLP solutions is presented as a pilot study using multiple NLP sentiment analysis methods, and has a few areas which need improvement. We have initiated the pilot project with a set of expert-created Italian language sentences. This raises a few issues, firstly, the dataset is small and specifically created for this study. Secondly, it is a small dataset, especially in the context of current NLP modeling which uses large quantities of data. We have tested two machine translation models and applied four sentiment analysis methods. We employed both lexical and machine learning based sentiment analysis methods. Thus, we accomplished the main objective of the pilot study, which is to establish a repeatable process for further analysis of machine translation driven NLP solutions.

The financial services industry has adopted numerous AI applications and big data solutions. Notably, NLP methods are used to gauge investor and market sentiment, and perform other kinds of behavioral analysis (Fernández, 2019; Chen, et al., 2020). Our research is expected to support the extension of many English language NLP tools customized for finance and investing to accommodate multiple languages, thus increasing the breadth of input data, leading to improved insights and decision making.

Analyzing Demographic Data to Address Gender and Racial Wealth Disparities through Financial Literacy Initiatives

Authors:

Malasa Vikram Bidadi Iyengar, PhD, Saint Peter’s University
Sri Sarat Chaitanya Gollapalli, PhD
Sharath Kumar Jagannathan, PhD
Gulhan Bizel, PhD

Keywords: Wealth inequality, Gender disparities, Racial disparities, Financial literacy, Demographic data

Abstract: Persistent gender and racial wealth disparities in the US necessitate further investigation to develop effective strategies for promoting financial equality. This study analyzes demographic and economic data from various counties in the US to examine differences in population size, income levels, and salaried employment across diverse racial and gender groups. The study identifies significant disparities in median income and salaried population among different racial groups, contributing to the widening wealth gap and exacerbating economic inequalities. To address these disparities, targeted money management initiatives are proposed, focusing on improving financial literacy and promoting equal opportunities for women and minority groups. The study employs multiple regression, clustering, and decision tree analysis to identify factors contributing to wealth disparities and to develop targeted interventions based on these factors. The study’s findings can inform policy and practice by highlighting the importance of addressing systemic issues such as discrimination in housing, education, and employment opportunities, and by identifying effective strategies for promoting financial literacy and narrowing the wealth gap. By adopting targeted financial literacy initiatives and implementing comprehensive measures, we can work towards a more inclusive and equitable society.

The Current State of Wealth Inequality:

Wealth Inequality in the US is at all-time highs. The Federal Reserve recently reported that the wealth held by the richest 1% of families is now 15 times greater than that of the bottom 50% of households combined. But the population is not spread equally among this wealth concentration. Significant wealth gaps affect women and people of color.

Gender Disparities

In the US, women continue to make less money than men do on average, and they are also less likely to own their own businesses or amass wealth over the course of their lifetimes. The National Women’s Law Center estimates that full-time, year-round working women only make 82 cents for every dollar made by men. With Black women earning only 63 cents and Hispanic women getting only 55 cents for every dollar made by white, non-Hispanic men, the wage disparity is much greater for women of color. Due to longer life expectancies and a higher likelihood of caring for others, women are also less likely to have access to employer-sponsored retirement plans and may face greater financial risk.

Location: Meeting Room 206

FinTech Session

Session Moderator: Thanh Trung Nguyen, PhD, Rowan University

Restricting the Private Attorney General: Blockchains and Global Financial Markets

Author:

Yuliya Guseva, S.J.D., LL.M. Rutgers University, The State University of New Jersey

Keywords: blockchain, litigation, presumption against extraterritoriality

Abstract: Modern financial innovations involve global processes, transactions, and services. These dynamic supranational structures have begun to challenge the longstanding modes of regulation, which were traditionally built around national interests and the principles of national prescriptive and adjudicatory jurisdiction. It has been a historical precept, however, that U.S. Congress legislates with domestic affairs in mind. This presumption against extraterritoriality is a canon of statutory construction that was embedded in broad principles laid out in seminal Supreme Court cases over several centuries. After decades of relative dormancy in the 20th century, the presumption against extraterritoriality was renewed and refurbished in the Supreme Court jurisprudence of the last 30 years.

In 2010, the Supreme Court decided Morrison v. National Australia Bank, a case that has redefined the presumption against extraterritoriality and produced a particularly strong effect on securities class actions, commodity disputes, and derivatives litigation. Morrison was swiftly followed by two more Supreme Court decisions (Kiobel v. Royal Dutch Petroleum Co., and RJR Nabisco, Inc. v. European Community) that further strengthened the presumption against extraterritoriality and elucidated how it should be applied. Soon thereafter, Morrison’s directives on extraterritorial analysis of securities statutes made their way into the jurisprudence on commodity and derivatives law.

After the presumption against extraterritoriality had percolated in lower courts, it congealed from a flexible canon of construction into a strict rule on how a domestic, non-extraterritorial transaction should look like in order for plaintiffs to have the private right of action under securities law and commodity law. The newly restrictive rules spread from the Second Circuit Court of Appeals’ decisions to Securities Act, Exchange Act, and Commodity Exchange Act jurisprudence in several other Circuits, with the Tenth Circuit joining in 2021.

The newly restrictive approach is now superimposed on fluid financial innovations produced by emerging and evolving technologies such as blockchain. When a rigid standard is applied to a dynamic set of facts without proper interpretation, the rigid becomes the fragile and transient. Undermining the vaunted dynamism of common law, blunt tools of construction result in suboptimal decisions. If this suboptimality undermines the ability of private plaintiffs to seek recourse in cases involving innovative services and transactions executed through global financial markets, it must be addressed by scholars and, ultimately, by courts and Congress.

This article examines how the current version of the presumption against extraterritoriality affects the role of private litigation in globalized transactional settings and concentrates on distributed ledgers and blockchain technology (“DLT”). Just as DLT and blockchains are global, so are many related cases and transactions touching upon foreign countries and elements. Under the currently strict interpretation of the presumption against extraterritoriality, however, private parties may be denied their day in court at a time when deterrence of fraud and improvement of market integrity are deeply needed in global technology-enabled markets. Private litigation as a policy mechanism assisting regulatory efforts to police fraud and enforce compliance is thus marginalized.

Yet, private litigation is a crucial instrument that may increase global welfare by enhancing accuracy of disclosure, transparency, and market liquidity. An incongruous construction of the presumption against extraterritoriality by the lower courts, however, threatens to undermine these economic benefits of private litigation in the global blockchain-based transactions and cryptoasset markets. This article will examine how the strict rules of construction increase market fragility, simultaneously leave investors with no recourse, and produce negative economic implications in technology-enabled financial transactions.

Unsupervised Learning on Financial Datasets: Pitfalls and Solutions

Author:

Iulian Neamtiu, PhD, New Jersey Institute of Technology

Keywords: Unsupervised Learning, AI/ML reliability, Financial dataset analysis

Abstract:

We illustrate the pitfalls of using Unsupervised Learning (UL) for financial applications, and outline principled, rigorous solutions for mitigating issues found in state-of-the-art UL toolkits.

UL consists of widely-used techniques for identifying patterns in unlabeled data, e.g., grouping objects that are related, finding objects that share similar characteristics, or identifying outliers/anomalies. Examples of UL include Clustering, Anomaly Detection, Self-organizing Maps, etc. UL is attractive for several reasons: it does not required labeled data, it does not require a large numbers of samples to perform well, and is interpretable/explainable. Hence, unsurprisingly, UL has been used in correctness-critical applications in high-stake domains including banking, finance, or medical sciences.

When developers implement UL algorithms they make decisions such as when to use randomness, which distance measure to use, what "control knobs" (parameters) to offer users, or what default values to use for such parameters. While developers' intention is to improve performance (accuracy) or increase efficiency (reduce time), these latent assumptions can lead to unreliable execution, which leads not only to low UL performance (accuracy), but can be exploited by adversaries.

Over the past 5 years, our research group has shown that commodity, state-of-the-art UL implementations of UL algorithms (Clustering, Anomaly Detection, Self-organizing Maps) are unreliable and has proposed solutions for improving reliability. In this talk we will focus on two main issues: (wide output variations across repeated runs of the same implementation on the same dataset) and (wide output variations between toolkits on the same dataset). We exposed such issues in popular UL toolkits (Matlab, R, Scikit-learn, TensorFlow) across 500+ datasets covering a wide range of domains: finance, healthcare, cybersecurity, etc.

First, we show the consequences of nondeterminism and inconsistency in Clustering tasks. The objective of Clustering (aka cluster analysis) is to partition a given n-point dataset D into K "clusters": points within a cluster are related or share a certain characteristic. When Clustering is used on financial datasets, nondeterminism and inconsistency can affect decisions such as personal credit worthiness, analyzing corporate bankruptcy risk, determining municipal bond rating, or determining auto insurance risk pools.

Next, we show the consequences of nondeterminism and inconsistency in Anomaly (or outlier) Detection tasks — identifying points in a dataset that were generated by means other than "normal" processes. Note that for fraud or intrusion detection, Anomaly Detection is the increasingly preferred method due to the growing threat of unseen (zero-day) threats. We illustrate issues with Anomaly Detection: unreliable detection of marketing campaign outcomes and unreliable identification of consumer churn. Even more worrisome when it comes to Anomaly Detection, we show that nondeterminism can be exploited by an attacker that tries to have a malicious input (outlier) classified as benign input (inlier), e.g., to "fly under the radar" when launching a cybersecurity attack or potentially committing financial fraud.

We then present our solutions.

For Clustering, we show that by addressing issues such as "bad" default parameter settings, or noise insertion, we can improve determinism, increase consistency, and can even improve efficiency. We validate our approach on popular Clustering algorithms — Affinity Propagation, DBSCAN, Hierarchical Agglomerative Clustering — as implemented in popular toolkits: Scikit-learn, R, MLpack, Matlab, TensorFlow.

For Anomaly Detection, we present DeAnomalyzer, a tool we developed that uses a feedback-directed, gradient descent-like approach to maximize determinism and consistency for one or more given implementations. DeAnomalyzer has proven successful on popular ML toolkits: MATLAB, R, and Scikit-learn.

Our findings and solutions can benefit UL users, developers, and testers, making UL applications (or decisions made with the support of UL applications) more reliable and trustworthy.

Cryptocurrency: What is on the Minds of our Students?

Author:

Jay Liebowitz, PhD, Columbia University Data Science Institute

Keywords: Cryptocurrency, FinTech, Data science, Big Data, Crime, Digital assets

Abstract: Based on my new book, Cryptocurrency: Concepts, Technology, and Applications (Taylor & Francis, April 2023), we see that Cryptocurrency is certainly a hot topic, especially with the recent issues with FTX. To further explore the interest level of our college/graduate students on the topic of cryptocurrency, a survey was conducted across 4 business schools in New Jersey to see how they view cryptocurrency now and in the near future. In addition, a 2022 summer research faculty fellowship at the American Institute for Economic Research provided further insights into this matter. The presentation will focus on the survey results and interviews, and also suggest what should be taught in a Cryptocurrency course/program.

Parallel Community Detection in Bitcoin Transaction Network with Arachne

Authors:

Fuhuan Li, New Jersey Institute of Technology
David Bader, PhD

Keywords: community detection, large-scale data, bitcoin transaction network, parallel algorithm, open-source framework

Abstract: Financial networks, including payment systems, stock markets, and blockchain-based cryptocurrencies, are commonly represented as graphs where nodes represent entities, such as users, financial institutions, and assets, and edges represent financial transactions or relationships between entities. The inherent complexity and interconnectedness of financial data make graphs a natural fit to represent it and hence, make graph analytic impactful tools. For example, community detection is one of the most powerful algorithms for analyzing complex financial networks. It can provide valuable insights for FinTech applications, such as risk management, fraud detection, and market analysis, by identifying communities.

Meanwhile, as the complexity and scale of data increase, the development of novel high-performance parallel graph algorithms is necessary. However, conducting these large-scale graph analytics in FinTech data presents challenges due to high parallel communication costs and memory requirements. These challenges require solutions that enable data scientists and finance researchers to handle and analyze large-scale graphs efficiently.

Arkouda is an open-source software framework created with the intent to bridge the gap between massive parallel computations and data scientists. By utilizing a communication system between the Chapel back end and the Python front end, Arkouda provides an easy-to-use interface for data scientists to conduct large file and graph analytics without the need for underlying Chapel code knowledge. Arachne, which is designed as an extension package to Arkouda, is built to provide large-scale graph analysis for Python users who require interactive graph analytics at scale.

In this work, we will present a parallel community detection algorithm as an example to help demonstrate how easily one can implement a parallel algorithm in Arachne to conduct graph analytics. We then evaluate the efficiency of the algorithm on a Bitcoin transaction network and deliver a reasonable conclusion based on the numerical results. Our work aims to bridge the gap between high-performance computing and data science by providing a straightforward framework for data analysts in the FinTech field.

Location: Faculty Lounge

2:00 – 2:45

Keynote 2: George Calhoun

Director of the Quantitative Finance Program and the Hanlon Financial Systems Center at Stevens Institute of Technology

(30 minutes speech, 15 min Q&A)

Location: Event Room

2:45 – 3:15

Keynote 3: Marc Rind

Chief Technology Officer-Data, Fiserv

Location: Event Room

3:15 – 3:20

Journal of Big Data Theory and Practice

Jim Samuel, Associate Professor of Practice, Executive Director – Informatics, Bloustein School of Planning and Public Policy, Rutgers University

Location: Event Room

3:20 – 3:30

Closing Remarks and Raffle

Manfred Minimair, NJBDA Symposium Chair, Professor, Seton Hall University
Matt Hale, President, NJBDA, Associate Professor, Seton Hall University

Location: Event Room

3:30 – 5:00

Post-Symposium Reception

Location: Chancellor’s Suite

Speakers

[tmfshortcode id="2100"][tmfshortcode id="2101"][tmfshortcode id="2102"][tmfshortcode id="2103"][tmfshortcode id="2104"][tmfshortcode id="2105"][tmfshortcode id="2106"]

Tuesday May 9, 2023

10th Annual NJBDA Symposium

Big Data in FinTech

Host: Seton Hall University

Keynotes

Kjersten Margaret Moody

Chief Data Officer, Prudential Financial

George Calhoun

Professor and Founding Director of the Quantitative Finance Program, Stevens Institute of Technology

Marc Rind

Chief Technology Officer-Data, Fiserv

Agenda

8:30 – 9:30

Registration, Refreshments, Exhibits

9:30 – 10:00

Welcome

10:00 – 10:45

Keynote 1: Kjersten Moody

10:45 – 11:00

Break, Exhibits (incl. Student Posters), Networking

11:00 – 12:00

Parallel Sessions:

AI/Machine Learning Applications

Data Assets and Privacy

Entrepreneurship in FinTech

12:00 – 1:00

Lunch, Exhibits (incl. Student Posters), Networking

1:00 – 2:00

Parallel Sessions:

Workforce Development for FinTech

Cryptocurrencies and Risk Management

Research Presentations in FinTech

AI/Big Data Session

FinTech Session

2:00 – 2:45

Keynote 2: George Calhoun

2:45 – 3:15

Keynote 3: Marc Rind

3:15 – 3:20

Journal of Big Data Theory and Practice

3:20 – 3:30

Closing Remarks and Raffle

3:30 – 5:00

Post-Symposium Reception

Speakers

Sponsors