Event Tracking SaaS

What is this project?

Event Tracking SaaS is a project I’m planning on working on in the near future. In the mean time, this is the design document I’m working on with ChatGPT.

Life Tracking & Analytics Tool – Technical Design

Introduction

This document outlines the technical design for a single-user life tracking SaaS tool. The tool allows the user to log personal events (activities, health notes, social interactions, etc.) and then provides analytics and insights on those logged events. It will be built with Next.js (App Router) and deployed on Vercel, using Next.js API routes for backend logic. Data is stored in a Supabase PostgreSQL database. The user interface will utilize shadcn/ui components for a consistent look and feel, and Recharts will be used to visualize analytics.

The system is initially single-user (no authentication needed), but the data model and architecture are designed to be easily extensible to multi-user in the future. Key features include free-text event logging with optional metadata, automatic NLP-based tagging and categorization (using both local logic and OpenAI GPT), attaching location data to events, a timeline view for browsing events, scheduled background analysis for correlations and trends (via Vercel cron jobs), and an analytics dashboard for visualizing insights.

System Overview

  • Event Logging: Users can quickly log events via a free-text input field. The input supports additional metadata like date/time (if logging a past event) and simple structured syntax (e.g. hashtags for tags).
  • Natural Language Processing (NLP): Each event description is processed using regex-based rules and the OpenAI GPT API to extract structured information – e.g. identifying tags/categories (health, sleep, social, etc.), people mentioned, numeric metrics (like durations or quantities), and date/time (if mentioned in text).
  • Tagging & Categorization: Events are annotated with both user-defined and inferred tags. Tags can be hierarchical (e.g. social:family:son) or have prefixes like person:Jim for individuals’ names. For example, an event “called Jim” might be tagged with person:Jim and categorized under a broader social/family tag if Jim is family. “Played with son” would be tagged social:family:son to indicate a family-related social event.
  • Location Metadata: The event logging UI allows attaching location data via the browser’s geolocation API. If enabled, each event can include latitude/longitude and an optional human-readable place name.
  • Timeline View: The UI displays logged events in a chronological timeline, grouped by date. Users can filter the timeline by tags (e.g. show only “health” events), keywords, and date ranges to easily find specific events.
  • Batch Analysis (Insights): A scheduled background job (using Vercel Cron) runs periodic analysis on the accumulated event data. This computes insights such as correlations (e.g. “Late sleep correlated with next-day headaches”), trends (e.g. increase or decrease in a tag’s frequency over time), and summary statistics. Results are stored in an insights table in the database.
  • Analytics Dashboard: The app provides an analytics view with visualizations (built with Recharts) to display trends and stats. Charts might include timelines of event counts, breakdown of tag frequencies, and any interesting correlations or metrics tracked. Insights from the batch analysis are also displayed here for the user.

All components are designed with future multi-user support in mind. In a multi-user scenario, each user would have their own events and insights (with appropriate authentication and authorization), but the core architecture remains the same.

Architecture Overview

The system follows a modern JAMstack-style architecture with a Next.js frontend and serverless backend, plus a cloud database and third-party NLP service:

  • Next.js Frontend (App Router): Provides the UI and client-side interactivity. Pages are built as React components (with the App Router in Next.js 13+). We will use shadcn/ui components (a Tailwind CSS + Radix UI component library) for building the interface (forms, buttons, lists, etc.). The App Router architecture allows mixing server and client components; we will use server components for initial data fetching where appropriate and client components for interactive parts (like filtering, charts).
  • Next.js API Routes (Backend): Custom API endpoints under the /app/api directory implement server-side logic (these run as Vercel Serverless Functions). Key API routes include event ingestion, fetching events, and triggering analysis. We use Next.js Route Handlers (route.ts files) to define GET/POST handlers for these APIs.
  • Database (Supabase/PostgreSQL): All event data and computed insights are stored in a Supabase-hosted PostgreSQL database. Supabase is used for its managed Postgres and easy client libraries. We will define tables for events, insights, and (optionally) tags. For now, with a single user, we might not need an explicit users table, but we will include a user_id field in tables to allow future multi-user extension. In the future, Supabase Auth could manage user accounts.
  • NLP Integration (OpenAI GPT): The backend will integrate with OpenAI’s GPT API (e.g. GPT-4 or GPT-3.5) to analyze event text. This is done server-side (within API routes) to keep the API key secure. We’ll send event descriptions to GPT with a prompt asking for relevant tags, entities (people, places), and any metrics, then parse the response.
  • Scheduled Jobs (Vercel Cron): We leverage Vercel’s cron feature to run scheduled jobs. A cron job will periodically call a specific API route (for example, /api/cron/analyze) at a defined schedule (e.g. nightly). This job will perform batch analysis on events (correlation, trending calculations) and update the insights in the database.
  • Analytics & Visualization: On the frontend, an Analytics Dashboard page will fetch data (events or insights) and render charts using Recharts (a React charting library). Chart components will be implemented as client-side components (with use client directive) since they involve dynamic rendering and possibly interactivity (tooltips, etc.).
  • Deployment: The app will be deployed on Vercel. The Next.js app (including API routes and static assets) will be hosted by Vercel. The Supabase database is a separate cloud service the API communicates with (via the Supabase client or REST API). Environment variables (Supabase URL/keys, OpenAI API key) will be configured in Vercel’s settings to be available at build and runtime.

Below is a simplified view of how the components interact:

[ User Browser ]
    ↳ Frontend UI (Next.js React + shadcn/ui)  
         ↳ (calls) Next API routes (serverless)
              ↳ (queries) Supabase Postgres (events, insights)
              ↳ (calls) OpenAI API (for NLP)
         ↳ (renders) Charts with Recharts (data from API/DB)
[ Vercel Cron ]
    ↳ (triggers) Next API route for analysis on schedule
              ↳ (queries) Supabase Postgres (events)
              ↳ (writes) Supabase Postgres (insights)

Since the tool is single-user, we won’t implement an auth flow initially. All data is effectively under one user context. However, the user_id field in the database and the structure of API routes would allow plugging in an authentication mechanism later (for example, using Supabase Auth or NextAuth) to support multiple users with isolated data.

Data Model and Schema

We define three primary data models: Event, Tag, and Insight. For simplicity in the initial implementation, we may not use a separate Tag table (tags can be stored as an array or JSON in the Event record). The schema below is designed to be extensible.

Event

The Event model represents a single logged event or activity. It contains the raw text description the user entered, a timestamp, and various structured metadata extracted from the text (tags, people, metrics, location, etc.).

Key fields for an Event:

  • id (UUID or auto-increment): Primary key.
  • user_id (UUID): Reference to a user (not used in single-user mode, but needed for multi-user).
  • text (Text): The original description entered by the user.
  • occurred_at (Timestamp): The date/time the event occurred. This can be provided by the user or default to the log time.
  • created_at (Timestamp): When the event was logged (defaults to NOW()).
  • tags (Text array or JSONB): A list of tags assigned to the event (could be an array of strings like ["health", "sleep", "person:Jim"]).
  • people (Text array, optional): A list of person names mentioned (e.g. ["Jim"]). We might encode people also as person: tags instead of separate field.
  • metrics (JSONB, optional): Key-value pairs for numeric metrics extracted (e.g. { "distance_km": 5, "duration_min": 30 } for an event “Ran 5km in 30min”).
  • location_lat & location_lon (Decimal, optional): Geographic coordinates if location is attached.
  • location_label (Text, optional): Human-readable location name/label (if provided or resolved).
  • updated_at (Timestamp): When the event was last modified (if editing is allowed later).

In PostgreSQL, the events table might look like:

-- PostgreSQL schema (e.g., run via Supabase migration)
CREATE TABLE events (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID,
  text TEXT NOT NULL,
  occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  tags TEXT[] DEFAULT ARRAY[]::TEXT[],    -- array of tag strings
  people TEXT[] DEFAULT ARRAY[]::TEXT[],  -- array of person names (optional)
  metrics JSONB,                          -- JSON of metric key-values (optional)
  location_lat DECIMAL(9,6),
  location_lon DECIMAL(9,6),
  location_label TEXT,
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

In TypeScript (for use in the application code), an Event type might be defined as:

type Event = {
  id: string;
  user_id: string;
  text: string;
  occurred_at: string;     // ISO date string
  created_at: string;
  tags: string[];          // e.g. ["health", "sleep", "person:Jim"]
  people?: string[];       // e.g. ["Jim", "Alice"]
  metrics?: Record<string, number>; // e.g. { "weight_kg": 70 }
  location?: {
    lat: number;
    lon: number;
    label?: string;
  };
};

Tag Storage: In this design, we store tags as a text array within the Event record for simplicity. Each tag is a single string (which may contain : to indicate hierarchy, e.g. social:family:son). In a future design or if we need to query tags across events efficiently, we could introduce a separate tags table and an event-tag join table. For now, the tags array with proper indexing (GIN index for text array) should be sufficient. We will interpret prefixes like person:Name as a special kind of tag representing people.

Insight

The Insight model represents a derived insight or analytic result from batch analysis. Each insight could be a statistical result, a detected correlation, or a trend summary.

Key fields for an Insight:

  • id (UUID): Primary key.
  • user_id (UUID): Owner user reference (again, only one user initially).
  • type (Text): The category of insight ("correlation", "trend", "statistic", etc.).
  • title (Text): A short title or name for the insight (e.g. “Late Sleep → Headache Correlation”).
  • description (Text): A human-readable description of the insight (e.g. “Headaches occur on 80% of days after you sleep past midnight.”).
  • data (JSONB, optional): Any structured data relevant to the insight (for example, a JSON object with values used in a chart, or the raw numbers behind the insight).
  • created_at (Timestamp): When the insight was generated.

Schema example for insights table:

CREATE TABLE insights (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID,
  type TEXT NOT NULL,
  title TEXT NOT NULL,
  description TEXT,
  data JSONB,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

And a TypeScript type:

type Insight = {
  id: string;
  user_id: string;
  type: 'correlation' | 'trend' | 'statistic' | string;
  title: string;
  description: string;
  data?: any;         // e.g. could hold a series of points or related tags
  created_at: string;
};

Example: An insight record for the correlation between late sleep and headaches might look like:

{
  "id": "uuid-...",
  "user_id": "uuid-user",
  "type": "correlation",
  "title": "Late Sleep vs Headaches",
  "description": "On 5 of the 7 nights you went to bed after midnight, you reported a headache the next day.",
  "data": { "lateNights": 7, "correlatedHeadacheDays": 5, "correlationRate": 0.714 },
  "created_at": "2025-04-21T00:00:00Z"
}

Tag (Optional)

If we later support a multi-user system or want to manage a controlled vocabulary of tags, we could introduce a Tag model:

  • id (UUID)
  • user_id (UUID)
  • name (Text): e.g. "health" or "social:family:son".
  • parent_id (UUID, nullable): to allow hierarchical tags (e.g. a tag with name “social:family” could be the parent of “social:family:son”).
  • type (Text, nullable): could classify tags, e.g. "person" for tags that represent people.

For now, we will skip a dedicated tags table and use simple conventions in the events.tags array. But this is noted as an extensibility point.

Event Logging and Input

Logging a new event is the primary interaction for the user. The design should make it quick and flexible to capture any kind of event.

Event Input UI

On the frontend, we will have an Event Logging Form. This could be placed on the main timeline page (e.g. at the top as a quick-add bar) or as a dedicated input section. Using shadcn/ui components, we can create a form that includes:

  • A text input (or textarea) for the event description.
  • (Optionally) a date/time picker if the user wants to log an event for a time other than “now”. By default, if no date/time is specified, the current time is used.
  • (Optionally) a control to attach location (e.g. a checkbox or toggle labeled “Attach current location”).
  • A submit button to save the event.

Free-text Input: The user can type any description, for example:

  • "Went for a 5km run with Jim at the park"
  • "Had a headache in the morning"
  • "Called mom and dad on Zoom"
  • "Weight check: 70kg"

The user is free to include structured cues in the text if they wish:

  • They might use hashtags for tags, e.g. "Went for a run #exercise #health".
  • They might mention a specific date/time or use natural language for time, e.g. "Yesterday evening - watched a movie".
  • They can mention people by name (“Jim”, “Mom”) or relationship (“son”, “dad”).

The system will interpret these via NLP (described later), so the user doesn’t have to do special formatting. But advanced users could adopt simple conventions (like #tag or a “key: value” syntax for metrics) which we will parse.

Date/Time Input: We’ll include a small date/time picker (using a component from shadcn/ui or a simple input type=”datetime-local”) for cases where the user wants to log an event that happened in the past or schedule one in the future. If left blank, occurred_at will default to now. If the user types a date/time in the text itself, our NLP will attempt to extract it as well, but an explicit field is more reliable for precise input.

Location Input: A toggle or button “📍 Add Location” will trigger the browser’s geolocation API. If the user grants permission, we capture the coordinates. We might display a preview (like “Location: [latitude, longitude]” or attempt to reverse-geocode to a name if possible). The user could also be allowed to label the location (e.g. type “Home” or “Office”). For MVP, we can just capture coordinates and maybe let the user enter a label manually if they want.

Using shadcn/ui: The form will use shadcn/ui form components for styling:

// Pseudocode for the Event Form (JSX/React)
<form onSubmit={handleSubmit} className="flex gap-2 items-center">
  <Input 
    placeholder="Log an event..." 
    value={text} 
    onChange={e => setText(e.target.value)} 
    className="flex-1"
  />
  <Popover> {/* for datetime picker, on click opens calendar/time selector */}
    <PopoverTrigger asChild>
      <Button variant="outline">{selectedDate ? format(selectedDate) : "Now"}</Button>
    </PopoverTrigger>
    <PopoverContent><Calendar onSelectDate={setSelectedDate} /></PopoverContent>
  </Popover>
  <Button type="button" variant={attachLocation ? "solid" : "outline"} onClick={toggleLocation}>
    📍
  </Button>
  <Button type="submit">Add Event</Button>
</form>

In this snippet:

  • <Input> and <Button> are shadcn/ui components.
  • A <Popover> is used to attach a date picker (for simplicity, pseudo-code assumes a Calendar component).
  • The location button toggles whether to attach location; if toggled, we call navigator.geolocation.getCurrentPosition to get coordinates (likely just before submitting or immediately when toggled).

Event Ingestion API (POST /api/events)

When the user submits the event form, the frontend will send a request to a Next.js API route (e.g. POST /api/events) to actually record the event in the database.

API Endpoint: /api/events (under the App Router, this corresponds to a file app/api/events/route.ts with a POST handler for creation, and possibly a GET handler for retrieval).

Request Payload: The client will send a JSON payload containing the event details:

{
  "text": "Went for a 5km run with Jim at the park",
  "occurred_at": "2025-04-21T18:30:00Z",    // optional, if user picked a date/time
  "location": { "lat": 12.34, "lon": 56.78, "label": "Central Park" }  // optional
}

If the user did not manually provide occurred_at or location, those fields may be omitted or null (the server will default the time and ignore location).

Server-side Processing: The POST handler will perform the following steps:

  1. Parse & Validate Input: Ensure text is present and not empty. If occurred_at is provided, parse it into a Date object or ISO string. If not provided, use current time.
  2. Basic NLP Processing (Sync): Before storing, we might do a quick local analysis (regex for known patterns, etc.) to extract preliminary tags or info. (Alternatively, we could do this after storing, but doing some upfront can allow storing structured data immediately.)
    • For example, find hashtags in the text (e.g. #exercise), or special syntax like weight:70kg.
    • We might remove these annotations from the raw text or keep them; likely keep the text as is, but we can still extract the structured pieces.
    • Identify if the text contains an explicit date/time like “yesterday” or “at 6pm” – if yes, we could adjust occurred_at accordingly (using a date parsing library or our own logic).
  3. Insert Raw Event: Create a new event record in the database with the information we have so far (text, occurred_at, created_at, and possibly any tags/metrics we already extracted locally). At this stage, tags might include user-specified ones (from hashtags or obvious keywords) but we haven’t done the full GPT analysis yet. We might insert the event with an empty or partial tags list and then update it after NLP, or we perform NLP first and insert once with all data (this depends on whether we want synchronous or asynchronous NLP – see discussion below).
  4. NLP enrichment (GPT call): Call the OpenAI API to get advanced analysis of the event text (unless we decide to do this asynchronously). The GPT integration will extract tags, people, etc. (detailed in the NLP section). Once we get the results, we update the event’s tags, people, metrics fields accordingly. This could be done via an UPDATE query if we already inserted, or we delay insertion until after we get the GPT response to insert everything at once.
  5. Respond to client: Return the created event (with its generated id and all metadata). If the NLP tagging was done synchronously, the response will include the tags and any structured data. If we choose to do NLP asynchronously (e.g. via a separate job), the initial response might just confirm the event was logged, and later the frontend could fetch updated data. For simplicity, we aim to do it in one go so that the event in the timeline immediately shows categorized tags.

Concurrency Note: Since the GPT call can take some time (a few hundred milliseconds), the client should handle the submission with a loading state. Alternatively, we could insert the event first (so it appears on the timeline immediately in an “unsorted” state) and then do the GPT analysis in the background (perhaps via a separate API route or in the background of the same request). However, doing it within the same request is straightforward and ensures consistency (the user sees final categorized event once it’s returned).

API Route Handler Example (pseudo-code):

// app/api/events/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { createClient } from '@supabase/supabase-js';
import { extractBasicMeta } from '@/lib/localNlp';  // hypothetical local parsing util
import { getOpenAISuggestions } from '@/lib/openai'; // util to call GPT API

const supabase = createClient(process.env.SUPABASE_URL!, process.env.SUPABASE_SERVICE_ROLE_KEY!);

export async function POST(req: NextRequest) {
  try {
    const { text, occurred_at, location } = await req.json();
    if (!text) {
      return NextResponse.json({ error: 'Text is required' }, { status: 400 });
    }

    // Determine timestamp
    const eventTime = occurred_at ? new Date(occurred_at) : new Date();

    // Basic local extraction (regex, keywords)
    let { tags, people, metrics, datetimeOverride } = extractBasicMeta(text);
    if (datetimeOverride) {
      eventTime.setTime(datetimeOverride.getTime());
    }
    // Merge any tags from local extraction (e.g. hashtags) with initial tags list
    // Ensure uniqueness
    tags = Array.from(new Set(tags));

    // Insert event with initial data (without GPT enhancements yet)
    const { data: insertData, error: insertError } = await supabase.from('events').insert([{
      user_id: null,  // single user mode (or a constant ID if we define one)
      text,
      occurred_at: eventTime.toISOString(),
      tags,
      people,
      metrics,
      location_lat: location?.lat,
      location_lon: location?.lon,
      location_label: location?.label
    }]).select().single();  // .select().single() to return the inserted row
    if (insertError) throw insertError;
    let event = insertData;

    // Call OpenAI for NLP suggestions (tags, entities) – this could also refine date/time
    const gptResult = await getOpenAISuggestions(text);
    if (gptResult) {
      // Merge GPT results with existing tags/people/metrics
      event.tags = mergeArraysUnique(event.tags, gptResult.tags);
      event.people = mergeArraysUnique(event.people, gptResult.people);
      // Merge metrics (possibly combine or overwrite if same keys)
      event.metrics = { ...event.metrics, ...gptResult.metrics };
      // If GPT found a specific datetime in text (and user didn't manually set one)
      if (!occurred_at &amp;&amp; gptResult.datetime) {
        event.occurred_at = gptResult.datetime;
      }

      // Update the event record with the new metadata
      await supabase.from('events').update({
        tags: event.tags,
        people: event.people,
        metrics: event.metrics,
        occurred_at: event.occurred_at
      }).eq('id', event.id);
    }

    return NextResponse.json(event, { status: 201 });
  } catch (err) {
    console.error('Error in POST /api/events:', err);
    return NextResponse.json({ error: 'Internal Server Error' }, { status: 500 });
  }
}

In the above pseudocode:

  • extractBasicMeta is a local function that might use regex to find hashtags (adding them to tags), find patterns like X kg or X km (adding to metrics), find known keywords, and also detect words like “yesterday” which it converts to a date in datetimeOverride.
  • getOpenAISuggestions calls the OpenAI API (e.g. a Chat Completion) with the event text and returns an object with { tags: string[], people: string[], metrics: Record<string, number>, datetime?: string } based on GPT’s interpretation. We’ll describe its implementation soon.
  • We insert the event first to get an ID. Alternatively, we could call OpenAI first and then insert with the full data – either approach works. The above approach ensures the event is stored even if OpenAI fails.
  • We use supabase.from('events').insert([...]).select().single() which is a Supabase client trick to return the inserted row in one call (Supabase supports RETURNING * under the hood).
  • After GPT, we update the event with any new tags or changes. We merge arrays ensuring uniqueness (to avoid duplicate tags if both local and GPT found the same).

Note: If OpenAI API is down or slow, we should have a timeout or fallback (so the event still gets logged). In worst case, the event might remain with only basic tags. The user can still see it in timeline. The next scheduled analysis could possibly refine it later as well if needed.

Event Retrieval API (GET /api/events)

For displaying events on the timeline or analytics, we need to fetch events from the database. We will implement a GET /api/events route to retrieve events, possibly with query parameters for filtering.

Basic usage: A call to GET /api/events returns a list of events (likely in chronological or reverse chronological order). In single-user mode, it will fetch all events for the single user. In multi-user future, it would filter by the authenticated user’s ID (likely using an auth middleware or by reading a token).

We can support query parameters:

  • ?from=YYYY-MM-DD&to=YYYY-MM-DD to limit by date range.
  • ?tag=health or ?tags=health,exercise to filter by one or multiple tags.
  • ?q=keyword to search text for a substring.

For example, GET /api/events?from=2025-01-01&to=2025-01-31&tags=sleep,health would fetch events in January 2025 that have both “sleep” and “health” tags.

Implementation: In Next.js route handler, we can access query params via request.nextUrl.searchParams. We’ll construct a SQL query or use Supabase query builder to apply filters:

export async function GET(request: NextRequest) {
  const url = request.nextUrl;
  const from = url.searchParams.get('from');
  const to = url.searchParams.get('to');
  const tagsParam = url.searchParams.get('tags');
  const search = url.searchParams.get('q');

  let query = supabase.from('events').select('*').order('occurred_at', { ascending: true });
  // Apply filters:
  if (from) query = query.gte('occurred_at', new Date(from).toISOString());
  if (to) query = query.lte('occurred_at', new Date(to).toISOString());
  if (tagsParam) {
    const tagsList = tagsParam.split(',');
    // Filter events that contain ALL of the tags in tagsList:
    for (let tag of tagsList) {
      query = query.contains('tags', [tag]);
    }
  }
  if (search) {
    // Supabase full-text search could be used if configured; or do a ilike on text.
    query = query.ilike('text', `%${search}%`);
  }

  const { data: events, error } = await query;
  if (error) {
    return NextResponse.json({ error: error.message }, { status: 500 });
  }
  return NextResponse.json(events, { status: 200 });
}

This is a simplified example. Supabase allows an .contains('tags', ['tagname']) filter to check if an array contains a value. For multiple tags, we apply multiple contains filters (meaning events must have all those tags). The text search uses ilike for a case-insensitive substring match.

The timeline view might not actually call this API if we decide to fetch data via server components (Next.js can fetch data directly in a page component on the server side). However, having an API route is useful for client-side filtering or if the timeline is implemented as a client component using SWR/React Query to fetch.

NLP Processing for Event Text

One of the core features is extracting structured information from the free-text event descriptions. We will use a combination of local parsing logic and OpenAI GPT API to achieve this.

Local Parsing and Heuristics

Before incurring the cost and latency of an OpenAI API call, we apply some quick local parsing to catch obvious patterns:

  • Hashtags: If the user uses hashtags (e.g. “Had a great workout #fitness #health”), we can easily extract those (fitness, health) as tags. We’ll strip the # and add them to the tags list.
  • Date/Time phrases: We can look for words like “yesterday”, “today”, “tomorrow”, or specific times like “at 6pm”, “noon”, or date strings like “2025-04-20”, “April 20”. A library like chrono-node could be used to parse casual date expressions from text. For example, “yesterday” would be converted to yesterday’s date. If we find a date/time in the text and the user hasn’t explicitly set one via the form, we can use this to adjust the occurred_at.
  • Metrics (numbers with units): Identify common units that might indicate a measurement. For example:
    • \b\d+(\.\d+)?\s?km\b or mi (distance)
    • \b\d+\s?(min|minutes|hrs|hours)\b (duration)
    • \b\d+\s?(kg|lb|lbs)\b (weight)
    • \b\d+\s?(calories|kcal)\b
    • We can capture the numeric value and unit and store it in the metrics JSON. For instance “5km” becomes { distance_km: 5 }, “30min” -> { duration_min: 30 }, “70kg” -> { weight_kg: 70 }.
  • Keywords for categories: We can maintain a simple dictionary of keywords to infer tags:
    • If text contains words like “sleep”, “bed”, “wake up”, we add tag sleep or health:sleep.
    • If contains “headache”, “fever”, “doctor”, tag health (or health:illness).
    • “run”, “exercise”, “gym”, “workout” => tag health:exercise.
    • “called”, “met with”, “with [Name]” => tag social (and possibly social:family or social:friends depending on the name).
    • Relationship words: “son, daughter, mom, dad, wife, husband, sister, brother” => tag social:family (and possibly also person:<Name> if a name is given or we can tag the role like family:son).
    • Work-related terms: “meeting, email, project, client” => tag work.
    • Entertainment: “movie, TV, game” => tag leisure.
    • Emotions: “happy, sad, upset, excited” => tag mood.
    • These rules can be refined over time. They serve as quick guesses and can be overridden or added to by GPT output.
  • People names: A regex to find capitalized words (assuming names are capitalized and not at sentence start) can catch some names. For example, if text contains “with Jim” or “Called Alice”, we detect “Jim” or “Alice” as a person. We add person:Jim tag (and also include “Jim” in the people array).
    • We might maintain a list of known contacts (the user’s family/friends) to classify them further (e.g. know that “Jim” = son, so add social:family:son). But initially, without a user-maintained contacts list, we rely on context or GPT to decide family vs friend.
    • However, some cues: “my son Jim” or just the word “son” implies family. If someone is referred to by relationship rather than name (e.g. “Played with my son”), we tag social:family:son. If a name isn’t recognized as family, we might default to assuming friend or unknown social, which GPT can clarify.

The local parsing will produce an initial set of structured data:

  • initialTags (from hashtags and keyword rules)
  • peopleNames (from capitalized words or known patterns)
  • metrics
  • inferredDate (if phrases like yesterday are present)

These are then used or sent to the OpenAI step.

OpenAI GPT API for NLP

For more nuanced understanding, especially to categorize events properly and catch subtle context, we use OpenAI’s GPT via their API. The GPT model (like gpt-4 or gpt-3.5-turbo) can take the event text and instructions, and output a JSON with the extracted info.

Prompt Design: We will craft a prompt for the model to extract the desired fields. For example:

System: "You are an assistant that helps categorize personal life events into structured data."

User: 
"Event: Went for a 5km run with Jim at the park.

Extract the following:
- tags: categories that describe the event (such as health, exercise, social, family, work, mood, etc.). Use hierarchical tags separated by colons if applicable (e.g. 'social:family', 'health:exercise').
- people: names of people involved (e.g. 'Jim').
- metrics: any numeric metrics (with a key name and value, e.g. distance_km: 5).
- datetime: if a specific time or date is mentioned in the text (otherwise blank).

Respond in JSON only."

We might include examples in the prompt to guide the model. However, since this is an automated system, we prefer a deterministic format: We instruct the model to output only a JSON object, e.g.:

{
  "tags": ["health", "exercise", "social:friends"],
  "people": ["Jim"],
  "metrics": { "distance_km": 5 },
  "datetime": null
}

In this example, GPT identified:

  • “run” -> health/exercise
  • “with Jim” -> social event, likely with a friend (if not known as family, assume friend) hence social:friends tag, and person “Jim”.
  • “5km” -> a metric (distance_km: 5).
  • “at the park” -> it might optionally note location, but since we handle location via geolocation, we might not ask GPT for location.

Using GPT in Code: We will likely use the OpenAI Node.js library or fetch API to call the endpoint. Pseudocode using fetch:

import fetch from 'node-fetch';

async function getOpenAISuggestions(text: string) {
  const prompt = `
Event: ${text}
Extract the tags, people, metrics, and datetime from the event.
Return a JSON object with keys: tags, people, metrics, datetime.
If a field is not applicable, use an empty list or null.
  `;
  const body = {
    model: "gpt-3.5-turbo",
    messages: [
      { role: "system", content: "You are a helpful assistant for extracting structured data from life event descriptions." },
      { role: "user", content: prompt }
    ],
    temperature: 0.2,
    max_tokens: 150
  };
  const res = await fetch("https://api.openai.com/v1/chat/completions", {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`
    },
    body: JSON.stringify(body)
  });
  const json = await res.json();
  const reply = json.choices?.[0]?.message?.content;
  if (!reply) return null;
  // The model should respond with JSON. We need to parse it.
  try {
    const data = JSON.parse(reply);
    return {
      tags: data.tags || [],
      people: data.people || [],
      metrics: data.metrics || {},
      datetime: data.datetime || null
    };
  } catch (e) {
    console.error("Failed to parse GPT response", e, reply);
    return null;
  }
}

We will incorporate the local parsed results as well. For instance, we can feed the model some of the initial tags we found: “Known tags: [health, exercise]” or “Initial tags: … You can add more or refine.” This way GPT knows what we’ve already identified. However, careful prompt engineering is needed so as not to confuse it. Perhaps simpler is: we do local extraction independently and then just merge results with GPT’s output as shown in the API route example.

Example outputs:

  • Input: "Had a headache after sleeping only 4 hours."
    • Local might catch “headache” -> tag health, “4 hours” -> metric sleep_hours: 4.
    • GPT might output tags: [“health:headache”, “health:sleep”], people: [], metrics: {“sleep_hours”: 4}, datetime: null.
    • We merge to get tags [health, health:headache, health:sleep]. Possibly we don’t need both health and health:sleep if we consider hierarchical; but could keep both or just the specific.
  • Input: "Played with son at the playground."
    • Local might see “son” -> add tag social:family:son.
    • GPT might output tags: [“social:family”, “social:family:son”], people: [], (maybe it doesn’t list the son’s name because none given), metrics: {}, datetime: null.
    • Merge results: tags [social:family:son, social:family]. We might decide if storing both is needed. Possibly storing the deepest tag “social:family:son” is enough since it implies the hierarchy. For filtering by family generally, we could use a prefix match or also store the parent tag explicitly. We can consider storing both to simplify queries (redundancy).
  • Input: "Called Jim and talked about work projects."
    • Local: sees “Jim” -> person Jim, likely add person:Jim; sees “work” -> tag work.
    • GPT: might identify tags [“social:friends”, “work”], people [“Jim”], datetime: null.
    • Merge: tags [work, social:friends, person:Jim]. If we know Jim is a friend (GPT assumed friends), that’s fine. If Jim was actually a family member, GPT might not know unless specified in text. The user can always edit tags later if needed.

We should also be mindful of GPT output variability. Using temperature ~0.2 and clearly instructing JSON output helps. We might need to handle cases where GPT still returns extra text or formatting issues (hence the try/catch on JSON.parse).

NLP processing is done each time an event is logged (real-time enrichment). We might also leverage the scheduled batch jobs to retroactively improve or normalize tags (for example, if our tagging strategy evolves, a batch job could re-tag old events). But initially, per-event processing should suffice.

Tagging and Categorization

Tags are a central organizational tool in this system. We aim to allow flexible tagging by the user and intelligent suggestions by the system.

Tag Structure and Conventions

We support hierarchical tags by using a colon-separated naming convention. This is purely a naming strategy (in the database it’s just strings). For example:

  • A top-level category like “health”, with a sub-tag “sleep”, could be represented as health:sleep.
  • Multi-level: “social:family:son” indicates a social event, specifically family-related, specifically involving a son.
  • Tags can also be flat (single word) if no hierarchy is needed, like “work”, “mood”, “travel”.

We also use prefix tags for certain types:

  • person:<Name> – indicates an individual person mentioned. E.g. person:Jim, person:Alice.
  • We could have other prefixes like place:<Location> if we wanted to tag place names from text, but since we store location separately, we might not use place tags now.
  • Possibly project:<ProjectName> if user tags work projects, etc.

The hierarchy indicated by colons is mostly for human understanding and potential UI grouping. We are not (at this stage) enforcing relational integrity between those (since we didn’t model Tag as a separate entity). But the UI can choose to group tags by their prefix part. For example, in a tag filter UI, we could show a tree:

- health
   - sleep
   - exercise
   - headache
- social
   - family
       - son
       - daughter
   - friends
- person
   - Jim
   - Alice
- work
- mood

This can be derived from the tags present in the events.

Assigning Tags to Events

Tags come from multiple sources:

  • User-defined in text: If the user uses a hashtag or a known keyword, it effectively defines a tag. E.g., “#sleep” in the text directly becomes the “sleep” tag.
  • Inferred via keywords: Our local NLP might add generic tags based on keywords as described (like “meeting” -> work, “happy” -> mood).
  • Inferred via GPT: GPT might add more conceptual tags or refine categories (like adding a parent category the user didn’t explicitly mention, e.g. user says “ran 5km” and GPT outputs [“health”, “exercise”] where “health” is a broader tag).
  • Relationship-based inference: If a person is tagged and we know their relation to the user, we could add a hierarchical tag. For example, if the user’s son’s name is Jim, and the event says “Jim”, GPT might guess social:family or we can have a user-maintained mapping (e.g. in future, user could have a config that “Jim” -> family:son).
  • Manual tagging (UI): In the future we might allow the user to manually add/remove tags on an event after logging (e.g. an edit feature). For now, we assume the automated tagging plus any hashtags they typed covers it.

Examples revisited:

  • “called Jim” – The system might tag this as person:Jim. But is it social:family or social:friends? If there’s no indication, it might assume friend (social:friends). If the user wants Jim categorized as family, they could edit or perhaps in the text they’d say “called my brother Jim” which then our parser would catch “brother” -> family. So context matters. GPT with world knowledge might not know the relation unless stated. In single-user scenario, we could let the user correct or train the system by some configuration outside of this design scope.
  • “played with son” – We tag social:family:son. This implies a person (the son) but since not named, we might not add a person: tag (no name). We just categorize under family:son. The system knows it’s a family social event.

Storage: We decided on storing tags as an array of text in each Event. For querying:

  • To get all events under a category (like all “health” events), we could search for tags that start with “health”. In SQL, that could be tags @> '{health}' (array contains health) OR any tag in the array LIKE ‘health:%’. We can handle that either in queries or in memory.
  • Alternatively, as mentioned, storing each level as its own tag too: e.g. if an event is tagged social:family:son, we could also store social and social:family in the array for redundancy. That way searching for “social:family” finds that event directly because it has that tag too. This duplication might be okay if we manage it (ensuring consistency). But it complicates adding/removing tags. For now, we won’t duplicate tags; we’ll just interpret the colon notation when filtering.

Tag Management: In the UI, we might list all unique tags from events to allow filtering. If multi-user in future, tags would be per user. If needed, we could have an endpoint like /api/tags that aggregates distinct tags from the events table (SELECT DISTINCT UNNEST(tags)). However, since number of tags is manageable, the frontend can also derive it from fetched events.

Edge Cases:

  • An event might end up with many tags. It’s fine as long as it’s useful (one could imagine an event touching multiple areas).
  • If the GPT suggests a tag that the user doesn’t agree with, currently they would have to manually edit the event (not in scope but possible future feature). In single user scenario, it’s okay if a tag is a bit off; the user can ignore it. But we aim to be reasonably accurate with our rules + GPT.

Location Metadata Handling

Location data for events is optional but useful for context (e.g. whether something happened at home, work, travel logs, etc.). We incorporate location as follows:

  • If the user opts to attach location when logging an event, we use the Browser Geolocation API. On clicking the “Add Location” button (or toggling it on), we call: navigator.geolocation.getCurrentPosition( (pos) => { // success setLocation({ lat: pos.coords.latitude, lon: pos.coords.longitude }); }, (err) => { console.error(err); /* handle denial or error */ } ); We might call this immediately on toggle so that by the time the user submits, we have the coords (or if it takes time, we might also allow submitting and fetching location concurrently, but simpler is to wait).
  • Once we have lat and lon, we can allow the user to optionally enter a label. For example, if this lat/lon corresponds to their home, they might type “Home”. We don’t have automatic reverse geocoding in this design (to keep things simple and free). In future, integration with a geocoding API could turn coordinates into an address or place name.
  • In the event payload (to the API), we include location: { lat, lon, label }. The server just stores these in location_lat, location_lon, location_label.

Data usage:

  • On the timeline, if an event has a location, we can display it. Perhaps just show the label if present (e.g. “🏠 Home” or “📍 Central Park”). If no label, we could display coordinates truncated or an icon that on hover shows the coordinates.
  • We could also allow clicking it to open Google Maps (with the lat/lon) if desired.
  • Filtering by location isn’t explicitly required, but we might consider a simple filter like “show events near me” or by label name. Not in initial requirements though.

Privacy: Since this is a personal tool, the location is just for the user’s own reference. We store it in DB as plain numbers. If multi-user later, location data might be sensitive so we would ensure only the user can see their locations.

Alternate approach: Instead of geolocation, the user might type location in text (e.g. “at the park” or “in London”). We could attempt to parse that via GPT or rules (like if text says “in Paris”, we detect place). However, reliable place extraction would require additional NLP (like using a location entity recognition). GPT could probably do it if asked, but then converting that to coordinates is another step. So initial design sticks to explicit geolocation for accuracy.

Timeline View (Events UI)

The timeline view is the main interface where the user can review and browse their logged events. It will display events grouped by date, and provide filtering controls to narrow down the list.

Display and Layout

We will group events by date (year-month-day). For each date that has one or more events, we show a heading with that date, and under it, a list of events (sorted by time).

For example, the timeline might look like:

## April 21, 2025 (Monday)
- 18:30 – Went for a 5km run with Jim at the park [Tags: health, exercise, social:friends, person:Jim]
- 08:00 – Had a headache in the morning [Tags: health:headache, health:sleep]
- 07:00 – Woke up after 4 hours of sleep [Tags: health:sleep]

## April 20, 2025 (Sunday)
- 21:00 – Watched a movie with family [Tags: leisure, social:family]
- 15:30 – Called Mom and Dad [Tags: social:family, person:Mom, person:Dad]
...

(This is textual representation; in the actual UI it will be styled nicely.)

Implementation: We can implement the timeline as a React component (Timeline.tsx). Possible approach:

  • Fetch the events data from the server (either via the API or via a server component).
  • Group the events by date. In JavaScript, we can use Array.reduce to partition events by occurred_at date (ignoring time). Or use a library like date-fns to format and compare dates.
  • Sort dates in descending order (most recent at top) or ascending (older first) depending on preference. Likely recent first is more useful to see latest events at top.
  • For each date group, render a section with a header and list items.

UI Components: Using shadcn/ui, we might use something like Accordion for collapsible date sections, or simply headings (<h2> or <h3>) styled with their typography classes and a <ul> for events.

  • Each event item can be in a Card or just a list item with some styling (maybe using shadcn’s Separator between items).
  • We can use small <Badge> components for tags (if shadcn/ui has a badge or we can style a span).
  • The time can be formatted to e.g. “HH:mm” and shown before the text.

Example event list item JSX (pseudo):

&lt;li className="py-2 flex flex-col">
  &lt;div className="text-sm text-muted">{formatTime(event.occurred_at)} – {event.text}&lt;/div>
  &lt;div className="mt-1 space-x-2">
    {event.tags.map(tag => &lt;Badge key={tag}>{tag}&lt;/Badge>)}
    {event.location &amp;&amp; &lt;Tooltip content={`${event.location.lat.toFixed(3)},${event.location.lon.toFixed(3)}`}>
        &lt;IconMapPin /> {event.location.label || 'map'}
      &lt;/Tooltip>}
  &lt;/div>
&lt;/li>

Here, <Badge> could be a styled span with background, <Tooltip> and <IconMapPin> to show location details on hover.

We will also integrate the event logging form (from earlier section) at the top of the timeline so the user can add a new event without leaving this page.

Filtering Controls

To help the user find events, we include filters:

  • Tag Filter: A multi-select dropdown or a set of checkboxes allowing the user to select one or multiple tags to filter. If tags are selected, the timeline only shows events that include all of those tags (logical AND filter). Alternatively, we could do OR (any of those tags), but AND is usually more useful to narrow down (this can be decided in UI design).
    • Implementation: we can gather all unique tags from events to populate the filter options. (If this list is long, a searchable dropdown is useful.)
    • We might use shadcn/ui’s Popover + Checkbox for a custom multi-select, or a Select component if it supports multi-select (not sure if shadcn’s does out of the box).
  • Keyword Search: A simple text input where user can type a keyword. This will filter events to those whose description (or maybe tags as well) contain that substring (case-insensitive).
  • Date Range Filter: A start date and end date picker to only show events in that range. We can use a dual Calendar or two date inputs. Possibly a slider if focusing on recent X days, but date pickers give full control. We might default to showing all if no range selected.
  • Filtering logic: This can all happen client-side since we have the events data. Selecting filters will update component state and we recompute the visible list. Because a single user’s data likely isn’t huge, client-side filtering is fine. If it became large, we could offload filtering to queries (using the API with query params as described).

If implementing on the client, filtering function might look like:

function filterEvents(events: Event[]): Event[] {
  return events.filter(event => {
    const matchesTags = selectedTags.length === 0 || selectedTags.every(tag => event.tags.includes(tag));
    const matchesSearch = searchQuery === "" || event.text.toLowerCase().includes(searchQuery.toLowerCase()) 
                            || event.tags.join(" ").toLowerCase().includes(searchQuery.toLowerCase());
    const eventDate = new Date(event.occurred_at);
    const matchesDate = (!startDate || eventDate >= startDate) &amp;&amp; (!endDate || eventDate &lt;= endDate);
    return matchesTags &amp;&amp; matchesSearch &amp;&amp; matchesDate;
  });
}

We can call this in render and group the filtered list accordingly.

Interactivity: The filter inputs (tag picker, search box, date pickers) will be controlled components in React, updating state (e.g. selectedTags, searchQuery, startDate, endDate). The timeline list will re-render based on current state.

Integration with Next.js App Router

We have options to fetch events:

  • Server-side fetch: In Next.js App Router, a page component can be an async Server Component that does const events = await getAllEvents(); (where getAllEvents might call the database or our internal API). This would return events at build/SSR time. Then the page could pass these to a client component for rendering (for filtering functionality).
    • Pro: initial page load can have all events without an extra client request.
    • Con: any new event added via form won’t be in that list unless we revalidate or refetch.
    • We can use Next’s revalidation or just push the new event into state when the form submits successfully.
  • Client-side fetch: Alternatively, make the timeline page a client component that uses useEffect or a SWR hook to fetch /api/events. Then we always get latest events but with a loading state initially.
    • For a single user personal app, either way is fine. We might lean to simplicity: fetch on client with a loading spinner.

We can also use the Supabase JS client on the client directly with an anon key to subscribe to changes (Supabase can do realtime), but that may be overkill. Simpler: after adding an event, just update state locally (since we get the event in response).

Pagination: If the user has many years of data, we might eventually need to paginate or virtualize the list. For now, we assume manageable volume and load all. The date grouping helps compress the view.

Edit/Delete Events (Future consideration)

The design hasn’t explicitly mentioned editing or deleting events, but it’s worth noting:

  • We could add a small “Edit” or “Delete” button on each event item (visible on hover perhaps). This would call corresponding API routes (PATCH or DELETE on /api/events/[id]).
  • Given it’s single user, there’s no conflict in deletion. For MVP we might skip implementing deletion or editing, but architecture allows it (just another API handler and some UI).
  • If implemented, editing could allow user to correct tags or text and we’d perhaps re-run NLP or allow them to manually adjust tags.

Scheduled Batch Analysis (Insights Generation)

To provide deeper insights, we run batch analysis jobs on the event data. This is done via Vercel’s Cron Jobs feature, which can trigger our API endpoints on a schedule.

Vercel Cron Setup

We will add a configuration in vercel.json to schedule our analysis function. For example:

// vercel.json
{
  "crons": [
    {
      "path": "/api/cron/analyze",
      "schedule": "0 0 * * *"
    }
  ]
}

This cron config means: call the /api/cron/analyze endpoint at midnight (UTC) every day. We can adjust the schedule as needed (perhaps daily is fine to generate fresh insights; or multiple schedules for different analysis tasks, e.g. a quick nightly analysis and a heavier weekly one).

We create a route handler at app/api/cron/analyze/route.ts to handle these requests. Vercel will invoke it as a GET request at the scheduled times (only on production deploy).

Note: Cron jobs on Vercel run only on the production environment, not on preview deployments.

Analysis Tasks

Within the scheduled function, we will perform various computations on the entire set of events (or on recent events). The goal is to derive insights such as correlations and trends. Here are some of the specific analyses we plan:

  1. Correlation: Late Sleep → Headache (example) – This was mentioned as an example, and we can generalize it to find if events with certain tags lead to other events.
    • Define what “late sleep” means: perhaps events tagged sleep or health:sleep that have an occurred_at time after a certain hour (say 00:00 or 1am).
    • Define “headache”: events tagged health:headache (or containing “headache”).
    • Compute how often a headache event occurs on the day after a late sleep event, compared to baseline headache frequency.
    • If the probability is significantly higher, that’s a correlation insight.
    • We then create an Insight record: type “correlation”, title “Late Sleep & Headaches”, description like “Out of X late-night sleep instances, you experienced a headache the next day Y times (Z%).”
    • This can be extended to other correlations: e.g. “Low Sleep Hours → Mood: sad”. If user logs mood, or “High Stress → Insomnia”, etc. For now, we’ll explicitly code a few plausible ones:
      • Late sleep vs headache.
      • Low sleep duration vs feeling tired (if they log something like “tired” or “coffee intake high next day”).
      • Perhaps if weight is tracked, correlate diet tags or exercise with weight changes.
      • Social interactions vs mood (e.g. more social days correlate with better mood).
    • It might be too ambitious to find all automatically, so we decide which to implement.
  2. Trends:
    • Tag frequency trends: e.g. compare this week vs last week or this month vs last month for certain tags. If a tag usage has increased or decreased significantly, record an insight. Example: “Exercise events increased: You logged 12 exercise sessions this month vs 8 last month.”
    • Daily or weekly patterns: e.g. “Most productive day: You log most work events on Tuesdays on average.” or “You tend to sleep later on Fridays.”
    • Metrics trend: If the user tracks weight or run times, we can compute the change over time. E.g. “Weight down 2kg compared to 3 months ago.”
    • New tags introduction: If a new tag appears that hasn’t been used before (signifying a new activity or concern), it could be noted (maybe not needed but interesting).
    • Summaries: e.g. “Total events logged: 100 events in 2025 so far.”
  3. Statistics:
    • Averages or totals: e.g. “Average sleep hours: 7.2h”, “Total distance run this month: 42km”.
    • Most frequent tags in a period: “This week’s top categories: social (5 events), work (4 events), health (3 events).”
    • If mood tracking was a thing: average mood rating etc.

For each insight we derive, we’ll insert a row into the insights table. We may choose to delete old insights each time and insert fresh ones so that the table always has the latest. Alternatively, we keep a history (with timestamp) of insights, but then the UI would need to filter the latest ones. Simpler: clear out old each day and replace with new.

Batch Processing Implementation: In app/api/cron/analyze/route.ts:

export async function GET() {
  // 1. Fetch all events (or maybe events in last N days if needed)
  const { data: events, error } = await supabase.from('events').select('*');
  if (error) {
    console.error("Analysis fetch error", error);
    return NextResponse.json({ status: 'error', error: error.message }, { status: 500 });
  }

  const insights: Omit&lt;Insight, 'id' | 'created_at'>[] = [];
  const userId = /* single user id or null */ null;

  // 2. Example: Late sleep vs headache correlation
  const lateSleepDays = new Set&lt;string>();
  const headacheDays = new Set&lt;string>();
  for (const ev of events) {
    const dt = new Date(ev.occurred_at);
    const dateKey = dt.toISOString().substring(0,10); // YYYY-MM-DD
    // Check if late sleep (past midnight)
    if (ev.tags?.includes('sleep') || ev.tags?.includes('health:sleep')) {
      const hour = dt.getHours();
      if (hour >= 0 &amp;&amp; hour &lt;= 4) { // between midnight and 4am let's call late
        lateSleepDays.add(dateKey);
      }
    }
    // Check headache event
    if (ev.tags?.includes('health:headache') || ev.text.toLowerCase().includes('headache')) {
      headacheDays.add(dateKey);
    }
  }
  // Now correlate: count how many lateSleep days had a headache the next day
  let correlationCount = 0;
  lateSleepDays.forEach(day => {
    // get next day
    const nextDay = new Date(day);
    nextDay.setDate(nextDay.getDate() + 1);
    const nextKey = nextDay.toISOString().substring(0,10);
    if (headacheDays.has(nextKey)) {
      correlationCount++;
    }
  });
  if (lateSleepDays.size > 0) {
    const percent = Math.round((correlationCount / lateSleepDays.size) * 100);
    insights.push({
      user_id: userId,
      type: 'correlation',
      title: 'Late Sleep -> Headache',
      description: `On ${correlationCount} of ${lateSleepDays.size} late-night sleeps, a headache was recorded the next day (${percent}%).`,
      data: { lateSleepCount: lateSleepDays.size, headacheNextDayCount: correlationCount, percentage: percent }
    });
  }

  // 3. Example: Tag frequency trend (month-over-month)
  // (Compute counts per tag for this month vs last month)
  const now = new Date();
  const thisMonth = now.getMonth();
  const lastMonth = (thisMonth === 0) ? 11 : thisMonth - 1;
  const thisMonthYear = now.getFullYear();
  const lastMonthYear = (lastMonth === 11) ? thisMonthYear - 1 : thisMonthYear;
  const tagCountsThisMonth: Record&lt;string, number> = {};
  const tagCountsLastMonth: Record&lt;string, number> = {};
  for (const ev of events) {
    const dt = new Date(ev.occurred_at);
    const m = dt.getMonth();
    const y = dt.getFullYear();
    if ((m === thisMonth) &amp;&amp; (y === thisMonthYear)) {
      ev.tags.forEach(tag => { tagCountsThisMonth[tag] = (tagCountsThisMonth[tag] || 0) + 1; });
    } else if ((m === lastMonth) &amp;&amp; (y === lastMonthYear)) {
      ev.tags.forEach(tag => { tagCountsLastMonth[tag] = (tagCountsLastMonth[tag] || 0) + 1; });
    }
  }
  // find significant increases
  for (const tag in tagCountsThisMonth) {
    const thisCount = tagCountsThisMonth[tag];
    const prevCount = tagCountsLastMonth[tag] || 0;
    if (thisCount >= 3 &amp;&amp; thisCount > prevCount * 1.5) {
      insights.push({
        user_id: userId,
        type: 'trend',
        title: `Increase in "${tag}" events`,
        description: `You logged ${thisCount} "${tag}" events this month, up from ${prevCount} last month.`,
        data: { tag, lastMonth: prevCount, thisMonth: thisCount }
      });
    }
  }

  // (More analyses can be added similarly...)

  // 4. Save insights to DB:
  // Remove old insights for user (since we'll replace them)
  await supabase.from('insights').delete().eq('user_id', userId);
  if (insights.length > 0) {
    await supabase.from('insights').insert(insights);
  }

  return NextResponse.json({ status: 'ok', insightsGenerated: insights.length }, { status: 200 });
}

The above code:

  • Fetches all events (in real scenario, could filter to last year or so if huge data).
  • Computes one correlation and one trend as examples.
  • Deletes existing insights (for that user) and inserts new ones.
  • Returns a summary (for debugging/logging).

We can have many such analysis blocks. Possibly separate some logic into helper functions if it gets large.

We should guard against analysis when there’s insufficient data (like if no lateSleep events, skip, etc., as shown).

The schedule is daily, but we might not need to generate, say, month-over-month trends every day. However, it’s fine since it just updates daily with mostly same values until month changes.

Multi-user considerations: In future, if multiple users, the cron job might have to iterate over each user’s data or (if using RLS and supabase functions) maybe have a separate process per user. But since we assume small scale, one job can handle all users in a loop (e.g. query events grouped by user and then do analysis per group).

Analytics Dashboard (Visualization)

The analytics dashboard provides a visual and summary view of the data. It complements the timeline (which is textual and detailed) with graphs and highlight insights.

Dashboard Overview

We will create a separate page in Next.js, e.g. /dashboard (or /analytics). This page will present:

  • Charts showing various aspects over time or distribution.
  • Insight cards displaying the results from the latest batch analysis (correlations/trends).

Likely, we will use a combination of line charts, bar charts, and maybe pie charts:

  • A timeline chart of the count of events per day or per week. This shows how active the user is in logging or how events frequency changes.
  • A tag breakdown chart – e.g. a bar chart of the top 5 tags by frequency (overall or in recent period).
  • If numeric metrics like weight or running distance are logged over time, a line chart for those metrics (date on X axis, value on Y).
  • Possibly a scatter plot for correlations if we have numeric pairs (for example, sleep hours vs mood rating if that existed).
  • A pie chart for categorical distribution, like proportion of events by category (health vs work vs social, etc.) – though pie might get cluttered if too many categories.

We’ll use Recharts, which provides React components for these charts. Since Recharts uses DOM/SVG, the chart components should be client-side (we add use client at the top of the component file in Next.js 13).

Data for Dashboard

The data can come from:

  • The insights table: We will query insights to get all current insights. Those with numeric data can be used for charts or displayed as text.
  • Direct queries on events for specific charts:
    • For event count timeline, instead of storing that in insights, we can compute it on the fly or via an API.
    • E.g. we can do a SQL query: SELECT date_trunc('day', occurred_at) as day, count(*) FROM events GROUP BY day ORDER BY day; – Supabase can allow such queries either via RPC or using its query builder with .select('count(*), occurred_at') grouping hack.
    • Or simpler, fetch all events to the client and count. But that might be heavy if many events and not necessary if we can get aggregated data server-side.
    • Perhaps implement a small API route /api/stats that returns needed aggregates (like we can integrate with the analysis job or separate).
    • However, given we already have all events in the timeline view, we could reuse that data via a context or so. But to decouple, let’s assume the dashboard fetches what it needs separately.
    • For MVP, fetching all events and computing in JS is okay (assuming not huge volume). But better to show a method with fewer data:
      • For example, in GET /api/events?summary=true we could have returned daily counts or tag counts.
      • Or just call the insights job manually to get some prepared data.
  • Using insights: We might create certain insights specifically for charts, e.g. an insight of type “timeseries” with a data field containing an array of daily counts. But this is perhaps overloading the insight table.

To keep it straightforward:

  • Use insights for high-level statements (like correlation findings).
  • Use direct computation for charts like event counts and tag frequencies.

Chart Implementation

We will create React components for each chart using Recharts:

  • For example, EventsTimelineChart for events per day.
  • TopTagsChart for tag distribution.
  • Possibly MetricsChart for any numeric metric series (like weight over time).
  • CorrelationChart if needed (like a special chart if we had data for correlation, e.g. sleep hours vs something).

Recharts usage example:

"use client";  // ensure this is a client-side component
import { LineChart, Line, XAxis, YAxis, Tooltip, CartesianGrid } from 'recharts';

const EventsTimelineChart = ({ data }) => {
  // data expected as array of { date: '2025-04-01', count: 3 } etc.
  return (
    &lt;LineChart width={600} height={300} data={data}>
      &lt;CartesianGrid stroke="#ccc" strokeDasharray="5 5"/>
      &lt;XAxis dataKey="date" />
      &lt;YAxis allowDecimals={false}/>
      &lt;Tooltip />
      &lt;Line type="monotone" dataKey="count" stroke="#8884d8" />
    &lt;/LineChart>
  );
};

We will need to supply data to this component. Possibly, we can generate the last 30 days of counts:

// For example, prepare data for last 30 days
const eventsLast30Days = events.filter(e => new Date(e.occurred_at) >= thirtyDaysAgo);
const countsByDay = _.countBy(eventsLast30Days, e => e.occurred_at.substring(0,10));
const timelineData = Array.from({ length: 30 }, (_, i) => {
   const d = new Date();
   d.setDate(d.getDate() - (29 - i));
   const dateStr = d.toISOString().substring(0,10);
   return { date: dateStr, count: countsByDay[dateStr] || 0 };
});

Then <EventsTimelineChart data={timelineData}/>. This yields a line chart of event counts for each of the last 30 days.

For tag distribution, we could take top N tags overall:

const tagCounts = {};
events.forEach(e => {
  e.tags.forEach(tag => { tagCounts[tag] = (tagCounts[tag] || 0) + 1; });
});
const topTags = Object.entries(tagCounts)
  .sort((a,b) => b[1] - a[1])
  .slice(0, 5);  // top 5
const topTagsData = topTags.map(([tag, count]) => ({ tag, count }));

Then use a BarChart:

&lt;BarChart width={400} height={300} data={topTagsData}>
  &lt;XAxis dataKey="tag" />
  &lt;YAxis />
  &lt;Tooltip />
  &lt;Bar dataKey="count" fill="#82ca9d" />
&lt;/BarChart>

We might need to style or angle the X-axis labels if they’re long (like “social:family:son”).

For metrics like weight over time:

  • If user logs weight, presumably they have events with a “weight_kg” metric. We can filter events that have metrics.weight_kg and take the value and date. Then create a line chart from that.
  • Similarly for run distance or others.

If there’s an insight about correlation, how to visualize it? If numeric (like correlation coefficient), we might just display the text. Or if we had enough data points, maybe show a scatter of e.g. bedtime vs headache occurrence, but that’s not straightforward on chart without UI clutter. So likely just text is fine for those.

Insight Display

We will show the insights from the insights table on the dashboard. This can be a list or cards:

  • Each insight has a title and description (and maybe type). We could style each as a small card or list item with an icon depending on type (e.g. 🔗 for correlation, ⬆️ for trend, ℹ️ for stat).
  • Example: A card that says:
    • Late Sleep -> Headache: On 5 of 7 late-night sleeps, a headache was recorded next day (71%).
    • Increase in “exercise” events: You logged 12 exercise events this month, up from 8 last month.
    • These are pulled from insights.description.

We will fetch the insights via an API or directly from DB in a server component. Could do:

const { data: insights } = await supabase.from('insights').select('*').order('created_at', {ascending: false});

Given we refresh them daily and replace, we might not need ordering, just take all.

Dashboard Page Layout

We can break the dashboard into sections:

  • Highlights/Insights: a list of text insights at the top.
  • Charts section: could be a grid of cards, each containing a chart with a title.
    • e.g. “Events in Last 30 Days” line chart,
    • “Top 5 Tags” bar chart,
    • maybe “Cumulative Events by Category” pie chart (just if it looks nice),
    • “Weight Over Time” line chart (if relevant),
    • etc.

Using shadcn/ui, we can use the Card component for each chart. For example:

&lt;Card className="p-4">
  &lt;CardHeader>&lt;CardTitle>Events in Last 30 Days&lt;/CardTitle>&lt;/CardHeader>
  &lt;CardContent>
    &lt;EventsTimelineChart data={timelineData}/>
  &lt;/CardContent>
&lt;/Card>

We create a responsive grid using CSS (Tailwind utility like grid grid-cols-1 md:grid-cols-2 gap-4).

We should mark the entire page or necessary components with use client because Recharts usage is client-side. Or we keep the page as a server component that fetches data, and within it include individual client components for each chart, passing data via props. That might be ideal:

  • Next.js server component queries events and insights.
  • It calculates or passes down aggregated data (or raw events) to the child chart components.
  • The child components, being client components, render the charts.

This way, data fetching is done server-side (no loading state for data) and charts still render on client.

Alternatively, we can fetch inside the client components but that would cause more network calls from the browser. Better to do it in one go on server if possible.

However, calling supabase from a server component is possible (especially if using the supabase JS client with service key, similar to API route). But an easier approach: reuse our API:

  • The server component could call our own API endpoints via fetch (internal fetch to /api).
  • Or use supabase directly by instantiating the client with service key (but careful not to leak that to client, but if it’s purely server side it’s fine).
  • Since we already have code for event retrieval and insight retrieval, using them directly might duplicate logic. It’s fine.

For clarity, maybe use the API:

const eventsRes = await fetch(`${baseUrl}/api/events`); 
const events = await eventsRes.json();
const insightsRes = await fetch(`${baseUrl}/api/insights`);
const insights = await insightsRes.json();

We would need an /api/insights GET route returning latest insights. (We should implement that similarly to events GET, simply selecting from table.)

So yes, add /api/insights route to get insights.

/api/insights (GET):

export async function GET() {
  const { data: insights, error } = await supabase.from('insights').select('*').order('created_at', { ascending: false });
  if (error) {
    return NextResponse.json({ error: error.message }, { status: 500 });
  }
  return NextResponse.json(insights, { status: 200 });
}

(This will list all insights, which currently might be only the latest set because we replaced them, or could be more if we kept history.)

Example Analytics UI Snippet

// Inside Dashboard page component (server component)
export default async function DashboardPage() {
  const events = await getAllEvents();    // maybe calls supabase or our API
  const insights = await getInsights();   // fetch from insights table

  // Prepare data for charts
  const timelineData = computeLast30DaysCounts(events);
  const topTagsData = computeTopTags(events);

  return (
    &lt;div className="p-4 space-y-6">
      &lt;h1 className="text-2xl font-bold">Analytics Dashboard&lt;/h1>
      
      {/* Insights Highlights */}
      &lt;section>
        &lt;h2 className="text-xl font-semibold mb-2">Insights&lt;/h2>
        &lt;div className="grid grid-cols-1 md:grid-cols-2 gap-4">
          {insights.map(insight => (
            &lt;Card key={insight.id} className="bg-muted p-4">
              &lt;CardTitle>{insight.title}&lt;/CardTitle>
              &lt;CardContent>{insight.description}&lt;/CardContent>
            &lt;/Card>
          ))}
        &lt;/div>
      &lt;/section>

      {/* Charts */}
      &lt;section>
        &lt;h2 className="text-xl font-semibold mb-4">Trends &amp; Charts&lt;/h2>
        &lt;div className="grid grid-cols-1 md:grid-cols-2 gap-8">
          &lt;Card>
            &lt;CardHeader>&lt;CardTitle>Events in Last 30 Days&lt;/CardTitle>&lt;/CardHeader>
            &lt;CardContent>&lt;EventsTimelineChart data={timelineData} />&lt;/CardContent>
          &lt;/Card>
          &lt;Card>
            &lt;CardHeader>&lt;CardTitle>Top 5 Tags (All Time)&lt;/CardTitle>&lt;/CardHeader>
            &lt;CardContent>&lt;TopTagsChart data={topTagsData} />&lt;/CardContent>
          &lt;/Card>
          {/* Additional charts if needed */}
        &lt;/div>
      &lt;/section>
    &lt;/div>
  );
}

// Example TopTagsChart component (client)
"use client";
import { BarChart, Bar, XAxis, YAxis, Tooltip } from 'recharts';

export function TopTagsChart({ data }) {
  return (
    &lt;BarChart width={300} height={250} data={data}>
      &lt;XAxis dataKey="tag" tick={{ fontSize: 10 }} interval={0} angle={-40} textAnchor="end" />
      &lt;YAxis allowDecimals={false} />
      &lt;Tooltip />
      &lt;Bar dataKey="count" fill="#3182ce" />
    &lt;/BarChart>
  );
}

We adjust styling to fit the content. The tag names on X-axis might need rotating if long (as shown with angle -40). The color is just an example.

This dashboard gives a quick overview: the user can immediately see if there are any notable changes or patterns without digging through the timeline.

Extensibility for Multi-User Support

While the current design assumes a single user (and thus skips authentication and user management), it’s structured to allow adding multi-user support with minimal changes:

  • User Accounts & Authentication: We could integrate Supabase Auth to handle user sign-up/login. Supabase provides email/password auth or OAuth. Alternatively, NextAuth could be used with credentials or third-party providers, storing users in the database. For simplicity, Supabase Auth might be preferred as it’s built-in and we are already using Supabase.
  • User Identification: Every table (events, insights, etc.) has a user_id field. In single-user mode, we might leave user_id as null or use a constant. In multi-user mode, this will be the unique identifier of the user (Supabase uses UUIDs for user IDs by default). All queries will need to filter by user_id to scope data to the logged-in user.
  • API Changes:
    • The API routes (events, insights) would need to authenticate requests. For example, using JWTs from Supabase or session cookies from NextAuth. We might have to verify the user’s identity in each request (Next.js middleware can help by decoding a token or we call Supabase’s getUser on the auth token).
    • Once we have the user ID from the auth, we apply it in DB queries: e.g. supabase.from('events').select('*').eq('user_id', currentUserId).
    • On inserts, use user_id: currentUserId.
  • Row-Level Security (optional): If using Supabase auth and calling Supabase directly from the client, we would enable RLS on tables so that each user can only access their rows. However, in our architecture, we are mostly using a service role on the server side. We could continue with server-side access (and just trust our server code to filter by user), or we can switch to using the user’s Supabase JWT on the client side for direct queries. Given the small scope, server-side filtering is fine.
  • UI Considerations: We would need a login page and account management UI if multi-user. This is outside our current scope, but the addition should not affect the core event logging and analytics logic, except ensuring the user context is available.
  • Multi-user Cron: For scheduled jobs in multi-user mode, we would have to generate insights per user. We could modify the cron function to loop through each user’s data: const { data: users } = await supabase.from('users').select('id'); for (const user of users) { // fetch events for user.id, compute insights, upsert into insights table with that user_id } Or we could rely on database queries to do some of this in SQL. In any case, it’s additional complexity but straightforward given the design.
  • Scalability: If there are many users, we might worry about the cron job taking too long scanning everyone’s data. Solutions could be:
    • Trigger analysis on a per-user schedule or when their data changes.
    • Use background job queues or Supabase Edge Functions instead of a single cron route.
    • But for a small number of users or moderate data, the single job approach is fine.

In summary, to support multi-user, we primarily need to add an auth layer and ensure all operations are scoped by user. The data model is already prepared for this with the user_id field. No major architectural overhaul is required.

Conclusion

This technical design document has outlined the architecture and component design for the life tracking and analytics tool. We covered the flow of data from the user’s input, through NLP processing, into storage, and back out to visualizations and insights. The system leverages Next.js App Router for a cohesive full-stack implementation:

  • Frontend: Modern React UI with reusable components (via shadcn/ui) for forms and layout.
  • Backend: Lightweight serverless functions for CRUD and analysis logic.
  • Database: Reliable Postgres storage on Supabase, structured for extensibility.
  • Integrations: OpenAI GPT for intelligent text analysis, and Vercel Cron for scheduled computations.
  • Visualization: Recharts providing an interactive view of the data trends.

The design emphasizes clarity and maintainability, with code snippets and structure intended to guide implementation (even with GitHub Copilot). By following this design, a developer can incrementally build out the features – starting from event logging and display, then adding NLP enrichment, then charts, and finally the cron-based insights – testing each part along the way.

Ultimately, this tool will enable a user to record their life events with ease and gain meaningful insights into their habits and patterns, all within a single-user app that can later grow to support more users. The next steps would be to proceed with implementation of the database schema, then the Next.js pages and API routes as described, and integrate the OpenAI and Recharts libraries accordingly. With this blueprint, the development should be systematic and straightforward.