CrewRide: Applying AI to Make Carpooling More Compatible, Comfortable, and Socially Intelligent

Why Carpooling Matters for Carbon Reduction

According to the U.S. Environmental Protection Agency (EPA), transportation accounts for the largest share—about 28%—of total U.S. greenhouse gas (GHG) emissions, with passenger vehicles being the leading contributors.¹ A typical gasoline-powered passenger vehicle emits about 4.6 metric tons of CO₂ per year.² These emissions stem from the combustion of fossil fuels and are directly proportional to vehicle miles traveled.

Carpooling offers a simple yet powerful solution: every additional rider in a vehicle reduces the number of cars on the road, cutting both total emissions and traffic congestion. If just one in five solo commuters chose to carpool one day a week, the resulting reduction in emissions would be substantial—equivalent to taking millions of vehicles off the road annually.

Rethinking Carpooling

CrewRide is a reimagined carpooling service that prioritizes the human experience—not just transportation efficiency. While most traditional services treat cars merely as tools to get from point A to point B, research and user behavior show that people often perceive cars as deeply personal spaces—sometimes even as a ‘second home.’³ And just like at home, we don’t invite in people we’re uncomfortable with.

CrewRide acknowledges this emotional and behavioral truth. Who joins you in your car, under what circumstances, profoundly shapes the carpooling experience. That’s why I designed CrewRide to go beyond simple routing or scheduling: my goal is to make carpooling not only environmentally sustainable, but also socially comfortable and psychologically meaningful. By matching co-riders thoughtfully—based on preferences, routines, and interpersonal compatibility—I aim to make shared commuting something people genuinely look forward to.

How Might AI Enhance Compatibility, Comfort, and Social Fit in Carpooling?

While platforms like UberPOOL, Lyft Shared, and Waze Carpool have driven significant progress in shared mobility, they often focus narrowly on route and cost optimization. CrewRide reimagines carpooling as an AI-enhanced social experience — integrating unsupervised learning and large language models to build a system that is not only efficient, but also human-aware, emotionally intelligent, and sustainability-driven.

1. Deep Embedded Clustering for Socially-Aware Ride Matching

A core challenge in carpooling adoption is the unpredictability of social compatibility. While conventional systems match riders based on time and location, CrewRide introduces a research-driven alternative using Deep Embedded Clustering — an unsupervised learning method that encodes behavioral, contextual, and preference-based data to form psychologically cohesive rider clusters. This approach draws from the Homophily Principle⁴, which suggests that people are more likely to build connections with others who share similar values, habits, or interests. By aligning riders with socially and behaviorally compatible peers, CrewRide fosters more comfortable, trustworthy, and repeatable commuting experiences.

2. Preparing for Human Interaction with LLM-Powered Interfaces

Sharing a vehicle involves nuanced interpersonal dynamics. Building on Social Penetration Theory⁵ — which explains how trust and closeness develop through the gradual exchange of personal information, often beginning with shared interests — CrewRide integrates a large language model (LLM) to help riders prepare socially for shared spaces. Examples include real-time conversation topic generators, mutual interest discovery, and preference-aware interaction styles (e.g., silence, music, or talk). These AI-powered features are designed to ease anxiety, encourage meaningful small talk, and make ride-sharing a more emotionally intelligent experience.

3. Algorithmic Fairness and Group-Based Incentive Design

While the current proof-of-concept focuses on ride-matching based on behavioral clustering and social readiness, the design of fair and transparent contribution models remains an important area for further development. Grounded in Equity Theory⁶, future versions of CrewRide could implement rotating contribution systems—such as alternating the driver role based on availability—to ensure perceived fairness among trusted cohorts. Additionally, soft contribution indicators like punctuality or driving history could be shared within the group to support sustained participation.

4. Informing the Design of Future Vehicles

While today’s vehicles are primarily designed for individual drivers, this research aims to inform the next generation of mobility systems — where software and vehicle design co-evolve to support dynamic, socially attuned cohorts. Insights from CrewRide could guide the development of features such as a dedicated “carpool mode” in vehicle interfaces, adaptive cabin layouts for interaction or solitude, and automatic environmental adjustments based on co-rider preferences. Though speculative, these directions highlight how AI-augmented, socially aware systems may shape future vehicles that promote human connection, comfort, and sustainability. Further interdisciplinary research is needed to bring these ideas into practice.

------ NOTE ------

The following section contains technical details and files related to the current development of CrewRide, with a primary focus on the first two features — socially-aware ride matching and LLM-powered ride preparation. Other two features remain conceptual and are intended for future exploration.

STEP 1 - Bootstrapping with Synthetic Humans

The first step in developing this service began with data collection and understanding. Since there was no publicly available dataset that captured the detailed behavioral and contextual information I needed, I chose to generate a synthetic dataset from scratch. I designed a rich simulation of 1,000 employees working in the Bay Area, incorporating attributes such as department, job role, commuting schedule, spoken languages, car ownership, carpool availability, preferred communication style, talk and music preferences, sensitivities (e.g., allergies), and interpersonal dynamics like frequent co-riders and do-not-match lists.

Distribution of employee preferences

Employee home locations were randomly distributed around key Bay Area cities, simulating realistic commuting distances and clusters. The result is a high-dimensional dataset that not only reflects practical commuting patterns but also supports modeling of social compatibility and carpool readiness.

Geographic distribution of synthetic employee data across the Bay Area.

STEP 2 – Clustering Synthetic Employees with DEC

The process begins with a comprehensive preprocessing stage designed to prepare heterogeneous employee profile data for representation learning. Raw time fields are first normalized into minute-based numerical values, enabling compatibility with downstream models. Categorical variables such as co-rider gender preference are one-hot encoded, while multi-label attributes—like music preferences and spoken languages—are transformed using multi-label binarization. These processed features are then aggregated into a unified dense feature matrix that serves as the input to the learning pipeline.

To extract structure from this high-dimensional space, I adopted a Deep Embedded Clustering (DEC) approach. The first phase involves training a deep autoencoder to learn compact latent representations by minimizing reconstruction loss over the input space. The resulting embeddings are then clustered using K-Means, with the optimal number of clusters selected via silhouette analysis. In the final stage, the clustering assignments are refined through a DEC optimization loop, which iteratively minimizes the KL divergence between soft assignments and a sharpened target distribution. This enables the system to discover stable and behaviorally meaningful employee segments—ready for downstream applications such as ride matching or cohort analysis.

The post-DEC clustering evaluation indicates that the Elbow Method reveals a sharp decrease in inertia from k=2 to k=3, after which the curve flattens—implying that three clusters capture the majority of structural variance in the latent space. The Silhouette Score reaches its peak at k=2, remains high at k=3, and declines notably beyond that point. Although k=2 offers the highest cohesion and separation, k=3 strikes a more practical balance between cluster compactness and interpretability, making it the preferred choice for meaningful segmentation.

In the initial projections (above set), both PCA and t-SNE plots show a broadly continuous distribution of points with no clearly distinguishable clusters. Although there is some spread in the PCA projection and radial dispersion in t-SNE, the lack of color coding reveals that the raw feature space—immediately after preprocessing—does not naturally lend itself to interpretable grouping.

By contrast, the final projections after DEC training (above set) exhibit three well-separated and compact clusters, made visible through color-coded DEC assignments. The PCA projection shows distinct grouping along a curved manifold, while the t-SNE embedding forms clearly partitioned clusters across spatial regions. This transformation demonstrates how DEC, through representation learning and KL-based clustering refinement, reshapes the latent space to amplify behavioral patterns that were previously entangled. The resulting clusters are not only structurally coherent but also more likely to reflect meaningful carpool preference groupings.

STEP 3 – Interpreting Latent Clusters Through Behavioral Feature Contrast

To interpret the behavioral composition of the DEC-generated clusters, I performed a differential feature analysis that compared the average value of each feature within a cluster to the overall population mean. For each cluster, the five features with the largest deviations were labeled as “High” or “Low” based on whether their values were above or below the global average. This method produced concise, feature-level characterizations that enhance the interpretability of high-dimensional embeddings in the context of user behavior.

Cluster 0 (“The Outgoing Enthusiasts”): Characterized by sociable, high-energy behavior with a strong preference for chatty conversation and sports-related topics.

Cluster 1 (“The Reserved Relaters”): Represents calm, introspective individuals who prefer quiet communication and light banter.

Cluster 2 (“The Selective Listeners”): Defined by a self-directed and low-engagement communication style. They prefer music and avoid work-related or casual chatter.

STEP 4 – LLM-Powered Ride Preparation for Social Readiness

CrewRide addresses the often-overlooked social dynamics of shared commuting by offering an AI-powered ride preparation feature. When a user accesses this feature, the system retrieves the talk preferences of co-riders from the database using Prisma, and queries NewsAPI.org for relevant, up-to-date headlines based on shared interests. This contextual data is combined into a narrative prompt, which is passed to a lightweight conversational model—LLaMA 3.2 3b—to generate socially appropriate small-talk suggestions in structured JSON format. The result is a curated list of casual conversation topics that helps riders feel more prepared and socially at ease.

Recent advances in fine-tuning language models for small-talk applications emphasize strategies that improve contextual relevance, tone sensitivity, and personalization. Techniques like Supervised Fine-Tuning (SFT) on dialogue datasets—such as PersonaChat for persona alignment and DailyDialog for emotional nuance—are especially relevant to CrewRide's socially aware goals. In parallel, Reinforcement Learning with Human Feedback (RLHF) is evolving beyond general-purpose tuning to incorporate application-specific reward models, including those that prioritize inclusiveness and comfort in shared ride scenarios.

STEP 5 – Next.js Backend as the Infrastructure of CrewRide Intelligence

The server solution, built with Next.js, provides structured API endpoints optimized for a mobile carpooling app. Leveraging Next.js’s efficient, file-based routing simplifies backend maintenance and scalability. Although session management and encryption are pending, Next.js’s adaptable architecture allows easy integration of these security features for secure and reliable mobile deployment.

The current server solution supports the following key functionalities through clearly segmented API endpoints:

Authentication and User Management: Employee registration, User login, Profile management, User signup
User Information Handling: Employee headshot upload/retrieval, Real-time location tracking
Messaging System: User-to-user message handling and communication
Ride Management: Ride search, join rides, modify rides, offer rides, cancel rides, complete rides, ride feedback, current rides by employee, ride history access, and ride prep

STEP 6 – Mobile App as the Entry Point to Crewride Intelligence

The Android mobile client for CrewRide is designed to deliver a seamless and intuitive user experience, with a focus on modularity and maintainability behind the scenes. It provides dedicated screens for ride search, offer and history, in-app messaging, and user profile management—all tightly integrated with the backend API. The interface prioritizes clarity and simplicity, aiming to minimize friction in everyday use. While the underlying codebase is structured to support scalable feature development, what users encounter is a clean, responsive experience. This mobile app serves as the primary point of interaction with Crewridie’s intelligent matching engine, making it an ideal platform for future research on user behavior, personalization, and trust in AI-powered mobility services.

FUTURE WORK

While the present prototype demonstrates key functionalities such as socially-aware ride matching and LLM-powered ride preparation, several important areas remain open for further investigation:

Real-world user evaluation: Testing CrewRide in real commuting environments to assess usability, social comfort, and ride satisfaction over time.

Behavioral clustering optimization: Fine-tuning clustering parameters and feature weights to reflect evolving social and commuting patterns more accurately.

Domain-specific LLM fine-tuning: Adapting large language models for carpool-relevant contexts, including small talk, etiquette, and personalized content delivery.

Equitable contribution modeling: Designing transparent mechanisms to balance contributions (driving, route flexibility, punctuality) among riders and drivers.

Future vehicle design insights: Extracting design requirements from shared ride data to inform the development of future car interiors optimized for social and collaborative commuting.

This marks the conclusion of the current development phase of CrewRide. The project establishes a working foundation and offers a strong launchpad for continued research, iteration, and real-world deployment in human-centric mobility systems.

Member

Project Year