Skip to main content

METHODOLOGY

Full transparency on how scores are calculated, where data comes from, and why each analytical technique was chosen. No black boxes.

OUR APPROACH

Civitas is an open-data AI/ML platform that aggregates data from official U.S. government sources into unified transparency scorecards for senators, House representatives, presidents, and Supreme Court justices. Every score is computed from publicly available federal records. We do not editorialize, endorse, or oppose any candidate or party.

Scores reflect observable behavior — voting patterns, funding sources, legislative activity — not ideology. A senator who votes with their party 100% of the time receives a lower independence score regardless of whether they are a Democrat or Republican. The system is designed to be structurally non-partisan.

Every metric on the scorecard includes a [?] tooltip explaining what it measures and how to interpret it. Hover on desktop or tap on mobile. We believe no number should be presented without context — if you see a metric, you should be able to understand what it means and where it came from.

When data is missing or insufficient, scores default to a neutral 50 out of 100. No politician is penalized for something we cannot measure, and no politician receives a perfect score without evidence. This implements Bayesian shrinkage toward a neutral prior — a standard statistical technique for preventing extreme estimates from small samples. [19] Efron & Morris 1975

The Action Center extends this mission to daily civic engagement. It automatically surfaces trending issues from news analysis, provides objective summaries free of editorial opinion, and recommends non-partisan actions citizens can take to participate in their government — without assuming which side of any issue the reader supports.

SENATE SCORECARD METRICS

Each senator receives five sub-scores on a 0-100 scale, weighted into an overall Representation Score. Higher is better.

Funding Independence (25%)

Measures two independent dimensions: (1) the ratio of individual donors to corporate PAC money, and (2) top-donor concentration — what fraction of total fundraising comes from the top 10 donors. PAC dependency is operationalized following Stratmann (2005), [5] Stratmann 2005who found that PAC contributions are more strongly correlated with roll-call alignment than individual contributions. The donor concentration component applies the same intuition as HHI but at the donor level, following Bonica (2014) who demonstrated that donor composition is a strong predictor of legislative behavior. [1] Bonica 2014

Promise Persistence (20%)

Tracks whether a senator's voting record aligns with their stated campaign promises. Platform text is extracted from official senate.gov websites and analyzed to identify key commitments. Votes are then cross-referenced against those promises using semantic search to find relevant legislation. [2] Naurin 2011

A confidence penalty is applied when few promises are evaluable: if only 1 of 10 promises could be checked against votes, the score blends toward 50 (neutral) rather than being inflated by a single data point. This implements Bayesian shrinkage toward the prior. [19] Efron & Morris 1975

This metric also incorporates floor advocacy analysis — whether a senator actively speaks on the Senate floor about their promised issues, parsed from Congressional Record proceedings. This captures effort that voting records miss: in a gridlocked Senate, a senator may not get bills to a vote but can still demonstrate persistence through floor speeches. The floor advocacy component is weighted at 15% of the promise score, following research on legislative speech as a signal of commitment. [3] Martin 2011

Independent Voting (20%)

Measures willingness to break with party leadership on votes that are not explained by constituent interests. We identify state-relevant policy areas by analyzing the senator's top donor industries (which serve as a proxy for the state's economic composition). Party-line votes on state-relevant issues are excluded from the independence penalty because they may reflect genuine constituent representation rather than blind party loyalty. [4] Carson et al. 2010

The score is adjusted by state partisan lean using Cook PVI as a proxy: a senator in a safe R+20 state voting with their party may be representing constituents, not following orders. Raw break rates are misleading without this contextual adjustment.

The score blends two components: party independence (60%) — the rate of breaking with the party on non-state-relevant votes — and donor independence (40%) — whether votes appear free from donor influence, measured by lobbying match alignment and PAC funding levels. We follow the methodological caution of Ansolabehere et al. (2003) [18] Ansolabehere et al. 2003in interpreting donation-vote correlations: correlation does not prove causation. [5] Stratmann 2005

Funding Diversity (15%)

Evaluates how traceable and diverse a senator's funding sources are. The score blends donor traceability (50%) — the fraction of funding from itemized (>$200), disclosed sources versus anonymous small-dollar contributions — with industry diversity (50%), measured as the inverse Herfindahl-Hirschman Index (HHI) of industry donations. HHI is a standard concentration metric from industrial organization economics; [6] Rhoades 1993in this context, funding concentrated in a single industry suggests potential regulatory capture, while broad funding suggests diverse constituent support.

Legislative Effectiveness (20%)

Measures how effective a senator is at advancing legislation. The score combines bill passage rates, cosponsorship network influence (PageRank), and ability to move bills through committee and floor consideration. Higher scores indicate senators who successfully shepherd bills into law and attract bipartisan cosponsorship.

HOUSE REPRESENTATIVE SCORECARDS

All 435 House representatives are scored using the same five-metric framework as the Senate: Funding Independence, Promise Persistence, Independent Voting, Funding Diversity, and Legislative Effectiveness. The data sources (FEC, Congress.gov, GovInfo) and classification techniques are identical, ensuring consistent, comparable scores across both chambers.

House members are sourced from the same Congress.gov API and FEC endpoints. The pipeline processes representatives in the same nightly run as senators, using the same embedding-based classification, content-based party alignment, and deterministic scoring formulas. The House leaderboard supports pagination and party filtering to navigate the larger membership.

SPONSORSHIP ANALYSIS (LEADERSHIP & IDEOLOGY)

Beyond the four scored metrics, each senator receives two informational metrics derived from cosponsorship networks — the pattern of which senators sign onto each other's bills. These metrics are not part of the Representation Score but provide additional context about a senator's role and positioning.

Legislative Leadership (0-100)

Measures legislative influence using the PageRank algorithm [32] Brin & Page 1998applied to cosponsorship networks. When Senator A cosponsors Senator B's bill, that creates a directed link in the network. PageRank computes centrality: a senator whose bills attract many cosponsors — especially from other influential senators — receives a higher score. This mirrors GovTrack's leadership methodology. [33] Tauberer 2012

The algorithm uses power iteration with a damping factor of 0.85 and converges in ~50 iterations. Raw PageRank values are rescaled to [0, 1] using a logarithmic transformation to compress the heavy-tailed distribution, then displayed as 0-100.

Ideology Score (0-1)

Computes a behavioral ideological position using Singular Value Decomposition (SVD) on the cosponsorship matrix, following Tauberer (2012). [33] Tauberer 2012The second singular vector (first is trivially related to overall activity) captures the primary ideological dimension — the axis along which senators most differ in who they cosponsor. This is analogous to DW-NOMINATE [20] Poole & Rosenthal 1985but derived from cosponsorship patterns rather than roll-call votes.

The ideology score is oriented so that lower values correspond to progressive positions and higher values to conservative positions, calibrated by checking the mean score of each party. It serves as a Bayesian prior for the partisan depth calculation: when a senator has few recorded votes, the ideology score regularizes the estimate; as vote data accumulates, the prior weight drops to zero. [19] Efron & Morris 1975

Sponsorship Description

Combines the leadership and ideology scores into a human-readable label (e.g., "progressive Democratic leader" or "conservative Republican backbencher"). The label encodes three dimensions: ideological position (progressive/moderate/conservative), party affiliation, and influence tier (leader/rank-and-file/backbencher).

PRESIDENTIAL SCORECARD METRICS

Presidents are scored on six dimensions, also 0-100 scale. Historical presidents (pre-Clinton) use static scores derived from the C-SPAN Presidential Historians Survey, Gallup approval records, and BEA/BLS economic data. Recent presidents (Clinton onward) have scores partially computed from live API data.

Independence (15%)

Assesses cabinet and advisor independence from corporate and lobbyist influence. Based on historical analysis of cabinet compositions — how many appointees came from industry versus public service backgrounds. Currently uses curated seed data; automated analysis is planned.

Follow-Through (20%)

Measures the ratio of campaign promises to executive and legislative action. Based on historian assessments and promise-tracking analysis. Currently uses curated seed data for historical presidents.

Public Mandate (15%)

Reflects approval trajectory and coalition retention. For modern presidents (Truman onward), this is grounded in Gallup average approval ratings. Pre-Gallup presidents are scored based on election margins and historian consensus.

Effectiveness (20%) — Partially Dynamic

Measures tangible economic outcomes: GDP growth and job creation. The pipeline fetches real employment data from the Bureau of Labor Statistics API (nonfarm payroll series CES0000000001) and calculates net jobs created during each term. GDP growth averages are seeded from BEA National Income and Product Accounts tables. The formula weights GDP at 60% and job creation at 40%, normalized against post-WWII historical averages.

Competence (15%) — Partially Dynamic

Evaluates administrative execution quality. The pipeline fetches executive order counts from the Federal Register API (federalregister.gov) for each presidential term. The score blends court success rate (40%), cabinet stability (30%), and EO activity rate (30%), where a moderate rate of executive action scores higher than extremes in either direction.

Agency Alignment (15%)

Measures how well executive agency actions align with stated presidential priorities. Evaluates whether federal agencies pursue the policy agenda the president campaigned on, based on regulatory actions and executive directives. Currently uses curated seed data; automated regulatory tracking is planned.

SUPREME COURT JUSTICE SCORECARDS

Justices are scored on impartiality and ideological consistency using case-level voting data from the Oyez Project and official Supreme Court records. Case opinions link directly to the official supremecourt.gov slip opinion PDFs.

Justice scoring evaluates whether a justice applies consistent legal principles across cases or shifts positions based on the political valence of the parties involved. This is analogous to the independence metric used for senators but adapted to the judicial context where party loyalty is replaced by jurisprudential consistency.

ACTION CENTER

The Action Center surfaces the most important civic issues of the day using automated news analysis. It is designed to inform, not persuade — every summary is non-partisan and presents facts without editorial framing.

NEWS ANALYSIS PIPELINE

RSS feeds from editorially independent, low-bias news sources (AP News, NPR Politics, Reuters, PBS NewsHour) are parsed hourly. Each article is filtered for U.S. policy relevance using embedding cosine similarity against policy area prototypes — the same sentence-transformer model used throughout the platform. Articles that pass the relevance threshold are clustered by semantic similarity to group coverage of the same story across sources.

TRENDING TOPIC INTEGRATION

Clusters are ranked using a weighted combination of coverage breadth (40%) — how many independent sources cover the story — and trending relevance (60%) — whether the topic aligns with what the public is actively discussing. Trending signals are drawn from Google Trends and policy-relevant Reddit communities, cross-referenced with the news clusters via embedding similarity.

NON-PARTISAN SUMMARIZATION

The top-ranked issues are summarized by the LLM with explicit instructions to present objective facts, avoid opinion or editorial framing, and recommend actions that do not assume which side of an issue the reader supports. Recommended actions include contacting representatives, attending public hearings, and reviewing primary source documents — not advocating for or against any policy position.

CROSS-REFERENCING

When a ranked politician is involved in a trending issue, the Action Center links directly to their scorecard. Related government documents from the Explore database are matched using semantic search. Source articles include direct links to the original reporting.

GOVERNMENT ACTIVITY TABS

Dedicated tabs for all three branches of government — Legislative (Senate and House), Executive, and Judicial — display the most recent government documents: floor speeches, executive orders, proposed rules, court opinions, and notices, pulled directly from the Explore database.

NATIONAL MONITORS

When an issue persists in the news across multiple days, the system automatically creates a National Monitor — a dedicated tracking page for that ongoing concern. Monitors build a sourced timeline of developments, detect when separate news stories are facets of the same underlying event using embedding similarity, and merge duplicate monitors automatically. Monitors transition to "watching" status when coverage subsides and reactivate when new developments appear.

YEAR-IN-REVIEW TIMELINE

Each day's top issue is permanently recorded in a timeline that accumulates throughout the calendar year. The Timeline tab provides a month-by-month chronological view of what mattered most, with top policy themes calculated for each month and the year as a whole. At year's end, this becomes a complete "Year in Review" of the issues that shaped civic life.

ELECTIONS TAB

The Elections tab displays upcoming election dates, Senate races with incumbent scores linked to their scorecards, and an interactive U.S. map for selecting states. State-specific information helps users understand their local races in the context of national trends.

INTERACTIVE GLOBAL NEWS MAP

The World tab features a 3D interactive globe that visualizes U.S.-related international news coverage. Countries mentioned in current news feeds are highlighted with points scaled by article count. Clicking a country scrolls to recent headlines about U.S. relations with that nation, linking to the original source articles.

CONTENT-BASED PARTY ALIGNMENT

A bill's partisan alignment is determined by analyzing what the bill does, not how senators voted on it. This is a deliberate architectural decision grounded in political science methodology.

The standard approach in political science — roll-call-based ideology estimation (DW-NOMINATE) [20] Poole & Rosenthal 1985— assumes sincere voting. But as Clinton, Jackman & Rivers (2004) note, this assumption is routinely violated by logrolling (vote trading), whip pressure, omnibus packaging, and tactical compromises. [21] Clinton, Jackman & Rivers 2004A senator might vote for a bill they ideologically oppose to secure support for a different bill, or because party leadership made it a litmus test.

HOW IT WORKS

We implement a nearest-centroid classifier (Rocchio 1971) [22] Manning, Raghavan & Schütze 2008in sentence-embedding space. Each party's known platform positions on each policy area (taxes, healthcare, environment, etc.) are embedded as centroids using the same sentence-transformer model used throughout the pipeline. Bill text is then embedded and compared to both party centroids via cosine similarity.

The stance direction (pro/anti) disambiguates cases where both parties have positions on the same topic: a "pro" environment bill (strengthen EPA enforcement) aligns with the Democratic platform, while an "anti" environment bill (roll back regulations) aligns with the Republican platform. This encodes the saliency-plus-direction model from manifesto research. [23] Laver & Garry 2000

TWO-SIGNAL FUSION

Content analysis is the primary signal for party alignment. Vote tallies from roll-call data serve as a secondary refinement. When both agree, confidence is high. When they disagree, content wins unless the vote data shows a clear party-line split (which is itself informative — the bill was important enough to whip). This follows Snyder & Groseclose (2000) who demonstrated that vote outcomes reflect party discipline as much as ideology. [24] Snyder & Groseclose 2000

ADAPTIVE LEARNING

Platform position descriptions are seed prototypes bootstrapped from published party platforms. As the pipeline processes bills, sponsor party data from Congress.gov serves as supervised ground truth — bills sponsored by a single party are labeled examples that refine the classifier over time. This follows the self-training paradigm. [25] Yarowsky 1995

CLASSIFICATION AND NLP PIPELINE

The pipeline classifies thousands of entities (bills, donors, industries, votes) per run. We use a tiered strategy that reserves expensive techniques for cases where cheaper methods fail, following the computational parsimony principle. [12] Jurafsky & Martin 2023

BILL POLICY AREA CLASSIFICATION

Bills and votes are classified into 15 policy areas (healthcare, defense, energy, etc.) using a tiered adaptive strategy:

  • 1.Learning store exact match — bills classified in prior pipeline runs are recalled instantly by ID. This is the experience replay pattern. [10] Lin 1992
  • 2.kNN against reference corpus — the k=7 most similar previously-classified bills in ChromaDB are retrieved and the policy area is assigned by similarity-weighted majority vote. [9] Cover & Hart 1967This is retrieval-augmented classification: the reference corpus grows with each pipeline run, improving accuracy over time. [26] Lewis et al. 2020
  • 3.Embedding similarity against policy descriptions — cosine similarity between bill text embeddings and pre-computed policy area description embeddings. This is the cold-start fallback using nearest-centroid classification. [7] Reimers & Gurevych 2019

The policy taxonomy is based on the Congressional Research Service (CRS) policy area scheme used by Congress.gov. The approach follows the text-as-data paradigm reviewed in Grimmer & Stewart (2013). [27] Grimmer & Stewart 2013Stance derivation (pro/anti/neutral) uses embedding cosine similarity against direction prototypes — the bill text is compared to semantic signatures of supportive, restrictive, and reform-oriented legislative language, following the Comparative Agendas Project coding tradition. [28] Baumgartner & Jones 1993Zero LLM calls are used for bill classification.

DONOR AND INDUSTRY CLASSIFICATION

Donor classification uses a five-tier strategy:

  • 1.FEC metadata — structured fields from the Federal Election Commission API encode committee type and designation codes, providing ground-truth classification for PACs vs. individual donors.
  • 2.Semantic detection — embedding cosine similarity against category prototypes replaces ~200 lines of hardcoded string patterns. This generalizes to unseen entities because distributed representations capture semantic meaning. [29] Bengio et al. 2003
  • 3.Learning store lookup — previously classified entities are recalled instantly by name.
  • 4.Embedding cosine similarity — donor names are compared against pre-computed industry description embeddings. Industry descriptions include exemplar company names as anchoring tokens, following the zero-shot classification setup. [30] Yin, Hay & Roth 2019
  • 5.k-Nearest Neighbor (kNN) — remaining unclassified donors are classified by the k=7 most similar already-labeled entities using distance-weighted majority voting. [9] Cover & Hart 1967This mirrors prototypical networks for few-shot learning [14] Snell et al. 2017where classification is performed by comparing query embeddings to accumulated real examples.

The kNN approach was chosen over LLM-based classification after empirical testing showed the LLM hallucinated invalid categories (producing labels like "SPORTS" or "RESTAURANT" outside the valid taxonomy) and was orders of magnitude slower. The kNN classifier processes ~5,000 donors in under 5 seconds versus 40+ minutes for the LLM, with more consistent results.

LEARNING STORE AND ADAPTIVE CLASSIFICATION

All classifications are persisted in a learning store (SQLite table) that functions as an evolving knowledge base. On subsequent pipeline runs, previously classified entities are retrieved instantly without recomputation. This is analogous to experience replay in reinforcement learning [10] Lin 1992 — past decisions inform future ones, improving both speed and accuracy over time.

The learning store also feeds into the self-training loop [25] Yarowsky 1995 — high-confidence classifications from prior runs become labeled examples for kNN and reference corpus retrieval in future runs. The system literally gets better each time the pipeline runs.

To prevent stale data from persisting when analysis algorithms are updated, the pipeline implements version-aware artifact management. At the start of each run, a SHA-256 fingerprint of all analysis source files is compared to the stored hash from the previous run. If the code is unchanged, all learning data is preserved to promote self-training. If the code has changed, stale artifacts (LLM results, learned classifications, kNN reference corpus) are automatically cleared so updated algorithms start fresh. The API cache (raw data from government APIs) is never cleared.

SEMANTIC SEARCH (EXPLORE)

The Explore feature uses dense passage retrieval [11] Karpukhin et al. 2020 to enable free-text search over government documents (floor speeches, executive orders, bills). Documents are chunked, embedded with all-MiniLM-L6-v2, and stored in ChromaDB for approximate nearest-neighbor retrieval. This outperforms keyword search (BM25) for conceptual queries like "climate policy" where exact term overlap is low.

HOW AI IS USED

Civitas uses two types of AI models, each for the task it is best suited for. AI is never used to generate scores directly — all scores are computed by deterministic, auditable formulas.

EMBEDDING MODEL (CLASSIFICATION + SEARCH)

all-MiniLM-L6-v2 (22M parameters) [8] Wang et al. 2020handles all classification tasks: bill policy areas, donor industries, party alignment, motion types, and semantic search retrieval. Sentence-transformers produce dense vector representations where cosine similarity correlates with semantic similarity [7] Reimers & Gurevych 2019 — making them ideal for classification-by-comparison tasks where category definitions exist.

LLM (NARRATIVE SYNTHESIS)

Qwen 2.5 1.5B via llama.cpp [16] Gerganov 2023 handles tasks requiring natural language understanding and multi-step reasoning:

Campaign promise extractionParses platform text from senator websites to identify specific policy commitments and assess whether votes support or contradict them
Voting pattern narrativeGenerates human-readable summaries of a senator's voting patterns across policy areas
Key vote reasoningExplains why specific votes were flagged as significant given a senator's donor profile and party dynamics
PAC identificationIdentifies the parent organization and industry behind opaque PAC names using world knowledge
Explore summariesOn-demand summaries of how a government document relates to a user's search query

WHAT AI DOES NOT DO

Score calculationAll five sub-scores use deterministic formulas with no LLM input. The math is fully auditable.
Bill classificationPolicy areas, party alignment, and stance are all embedding-based — no LLM in the loop.
Donor classificationFEC metadata + embeddings + kNN handle all donor and industry classification.
Data fabricationThe LLM only analyzes data already fetched from official APIs. It does not generate or invent facts.
Partisan framingPrompts are explicitly structured to avoid editorial framing. The LLM analyzes behavior, not ideology.

WHY THESE TECHNIQUES WERE CHOSEN

We follow a strict hierarchy: structured metadata first, then embedding similarity, then kNN, then LLM — reserving each more expensive technique only for tasks the cheaper ones cannot handle. The pipeline contains zero hardcoded keyword lists, regex patterns, or string-matching heuristics for classification decisions. Every classification is made mathematically via embedding cosine similarity against natural-language prototypes. [12] Jurafsky & Martin 2023

Embeddings (not LLM) for classification: sentence embeddings excel at text classification tasks when labeled examples or category descriptions exist. They are deterministic, fast, and avoid the hallucination risks inherent in generative models. [13] Minaee et al. 2021The kNN classifier further leverages accumulated labeled data as a growing reference set — a well-established approach in few-shot and semi-supervised learning settings. [14] Snell et al. 2017

Content analysis (not votes) for party alignment: roll-call votes confound ideology with legislative strategy. Analyzing what a bill does relative to published party platforms recovers ideological alignment more accurately, following the manifesto analysis tradition. [31] Laver, Benoit & Garry 2003

LLM for narrative synthesis: tasks like promise-vote cross-referencing and PAC identification require world knowledge and multi-step reasoning that embeddings alone cannot provide. These are inherently generative tasks suited to language models. [15] Wei et al. 2022

MODEL AND ARCHITECTURE

The inference model is Qwen 2.5 1.5B, a compact open-weight language model running natively via llama.cpp [16] Gerganov 2023 compiled with ARM-specific optimizations (cortex-a76, dot-product, fp16). This provides ~3x faster inference compared to containerized runtimes, generating ~8 tokens/second on the Raspberry Pi 5 CPU. Results are cached in a local database so each unique analysis is computed at most once.

The embedding model is all-MiniLM-L6-v2 [8] Wang et al. 2020, a 22M-parameter sentence transformer. It handles all classification (bills, donors, industries, party alignment), semantic search, and nearest-neighbor retrieval. Both models run entirely on-device with no external API calls.

DATA SOURCES AND APIs

All data is sourced from official US government APIs and public records. No data is purchased, scraped from paywalled sources, or fabricated.

SENATE DATA

Congress.gov APIBill text, voting records, member data, sponsored legislation, and bill sponsor party affiliation
FEC API (fec.gov)Campaign finance data: individual contributions, PAC donations, committee filings, disbursements, and committee type codes
GovInfo APIFull bill text for policy area classification, Congressional Record floor proceedings for advocacy analysis
Senate.govOfficial senator websites scraped for platform text and campaign promises, roll-call vote records with per-member votes

PRESIDENTIAL DATA

Federal Register APIExecutive order counts and metadata from federalregister.gov (Clinton onward, no API key required)
BLS APIBureau of Labor Statistics public API — total nonfarm employment payrolls for jobs-created calculations
C-SPAN Historians SurveyPresidential Historians Survey (2021) used as basis for historical president scoring
Gallup Historical DataAverage approval ratings for modern presidents (Truman onward)
BEA NIPA TablesBureau of Economic Analysis GDP growth data

EXPLORE FEATURE

Congressional Record (GovInfo)Senate and House floor proceedings — speaker-attributed transcripts from daily CREC packages
Federal RegisterExecutive orders, presidential memoranda, and proclamations with full text and metadata
Semantic SearchDocuments embedded with all-MiniLM-L6-v2 into ChromaDB for dense passage retrieval

SUPREME COURT DATA

Oyez Project APICase metadata, justice votes, oral argument transcripts, and decision breakdowns
supremecourt.govOfficial slip opinion PDFs linked directly from case records

RATE LIMITING

The pipeline respects all API rate limits: Congress.gov at 1.2 requests/second, FEC at 0.25 req/s, GovInfo at 1.0 req/s, and BLS at 25 queries/day. Data is cached for 72 hours to minimize redundant API calls.

ENVIRONMENTAL AND ETHICAL CONSIDERATIONS

LOCAL-FIRST ARCHITECTURE

The entire Civitas stack runs on a single Raspberry Pi 5 (16GB RAM) with an NVMe SSD. There are no cloud GPU instances, no third-party AI API calls, and no data sent to external services for processing. The LLM, embedding model, vector database, SQLite database, backend API, and frontend all run on the same device.

ENERGY FOOTPRINT

A Raspberry Pi 5 draws approximately 5-12 watts under load. Running the full data pipeline (100 senators, ~100 LLM calls) takes several hours but consumes roughly the energy of a single LED light bulb. By comparison, a typical cloud GPU instance (NVIDIA A100) draws 250-400 watts. [17] Patterson et al. 2021This project demonstrates that meaningful AI analysis does not require industrial-scale compute. The trade-off is speed: what a cloud GPU processes in minutes takes hours on a Pi. We consider that an acceptable trade for a nightly batch pipeline.

DATA PRIVACY

No user data is collected, stored, or transmitted. The site does not use cookies, analytics trackers, or advertising networks. All data displayed is derived exclusively from public government records. The only network requests are to official government APIs (congress.gov, fec.gov, api.bls.gov, federalregister.gov).

OPEN-WEIGHT MODEL

We deliberately chose Qwen 2.5, an open-weight model, over proprietary alternatives like GPT-4 or Claude. This means: no per-token API costs that could make the project financially unsustainable, no dependency on a third-party company's continued service, full auditability of the model's behavior, and no user queries or government data leaving the device.

LIMITATIONS AND HONESTY

This project has real limitations and we believe in stating them clearly:

  • -A 1.5B parameter model is less capable than larger models. It occasionally produces imprecise promise analysis. We mitigate this with caching, post-processing heuristics, and deterministic overrides where the model output can be verified against structured data.
  • -Historical presidential scores are editorially curated based on historian surveys, not computed from raw data. We are transparent about this distinction.
  • -Correlation between donations and votes does not prove causation. A senator who receives PAC money and votes favorably may be doing so for policy reasons unrelated to the donation. We follow the methodological caution urged by Ansolabehere et al. (2003). [18] Ansolabehere et al. 2003
  • -Content-based party alignment depends on the quality of platform position descriptions. While these are seeded from published party platforms and refined by sponsor data, edge cases involving bipartisan or cross-cutting legislation may be misclassified.
  • -FEC data has inherent reporting delays. Campaign finance filings may lag real-time donations by weeks or months.
  • -Embedding-based classification, while fast and consistent, lacks the world knowledge that a large model or human expert would bring. Edge cases involving shell companies or deliberately obscure entity names may be misclassified.

TECHNICAL STACK

HardwareRaspberry Pi 5 (16GB), NVMe SSD
BackendPython 3.12, FastAPI, SQLAlchemy, SQLite
FrontendNext.js 14, React 18, TypeScript, Tailwind CSS
Embedding Modelall-MiniLM-L6-v2 (22M params, sentence-transformers)
LLM Runtimellama.cpp (native ARM build), Qwen 2.5 1.5B
Vector DatabaseChromaDB (persistent, local)
ContainersDocker Compose (blue/green zero-downtime deploy via nginx)
Pipeline ScheduleNightly at 3:00 AM via APScheduler
Data Caching72-hour TTL with persistent SQLite cache
Learning StoreSQLite table for persistent classification memory, version-aware invalidation on code change
Pipeline OptimizationProducer-consumer threading: embedding prefetch overlaps LLM inference, context compression for prompts
API PaginationServer-side paginated voting records with filter support
Sponsorship AnalysisPageRank (leadership) + SVD (ideology) on cosponsorship matrix
ClassificationZero hardcoded rules — all classifications via embedding similarity or kNN
Metric TooltipsEvery scorecard metric has a [?] tooltip explaining what it measures
Branches CoveredSenate (100), House (435), Presidents (historical + modern), Supreme Court (9 justices)
Action CenterHourly news analysis with national monitors for ongoing concerns and year-in-review timeline tracking
News SourcesAP News, NPR Politics, Reuters, PBS NewsHour — editorially independent, low-bias wire services
Trending IntegrationGoogle Trends RSS + Reddit policy subreddits, cross-referenced via embedding similarity
Globe Visualizationreact-globe.gl — interactive 3D globe for international news mapping

REFERENCES

  1. [1]Bonica, A. (2014). Mapping the Ideological Marketplace. American Journal of Political Science, 58(2), 367-386. doi:10.1111/ajps.12062
  2. [2]Naurin, E. (2011). Election Promises, Party Behaviour and Voter Perceptions. Palgrave Macmillan. doi:10.1057/9780230304598
  3. [3]Martin, S. (2011). Using Parliamentary Questions to Measure Constituency Focus. Political Studies, 59(2), 472-488. doi:10.1111/j.1467-9248.2011.00885.x
  4. [4]Carson, J. L., Koger, G., Lebo, M. J., & Young, E. (2010). The Electoral Costs of Party Loyalty in Congress. American Journal of Political Science, 54(3), 598-616. doi:10.1111/j.1540-5907.2010.00449.x
  5. [5]Stratmann, T. (2005). Some Talk: Money in Politics. A (Partial) Review of the Literature. Public Choice, 124(1-2), 135-156. doi:10.1007/s11127-005-4750-3
  6. [6]Rhoades, S. A. (1993). The Herfindahl-Hirschman Index. Federal Reserve Bulletin, 79, 188-189.
  7. [7]Reimers, N. & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Proceedings of EMNLP-IJCNLP 2019, 3982-3992. doi:10.18653/v1/D19-1410
  8. [8]Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., & Zhou, M. (2020). MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. Proceedings of NeurIPS 2020. arXiv:2002.10957
  9. [9]Cover, T. & Hart, P. (1967). Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, 13(1), 21-27. doi:10.1109/TIT.1967.1053964
  10. [10]Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8(3-4), 293-321. doi:10.1007/BF00992699
  11. [11]Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., & Yih, W. (2020). Dense Passage Retrieval for Open-Domain Question Answering. Proceedings of EMNLP 2020, 6769-6781. doi:10.18653/v1/2020.emnlp-main.550
  12. [12]Jurafsky, D. & Martin, J. H. (2023). Speech and Language Processing (3rd ed. draft). Stanford University.
  13. [13]Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., & Gao, J. (2021). Deep Learning-Based Text Classification: A Comprehensive Review. ACM Computing Surveys, 54(3), 1-40. doi:10.1145/3439726
  14. [14]Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical Networks for Few-Shot Learning. Proceedings of NeurIPS 2017, 4077-4087. arXiv:1703.05175
  15. [15]Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Proceedings of NeurIPS 2022. arXiv:2201.11903
  16. [16]Gerganov, G. (2023). llama.cpp: Inference of LLaMA model in pure C/C++. GitHub. github.com/ggerganov/llama.cpp
  17. [17]Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguia, L.-M., Rothchild, D., So, D., Texier, M., & Dean, J. (2021). Carbon Emissions and Large Neural Network Training. arXiv:2104.10350.
  18. [18]Ansolabehere, S., de Figueiredo, J. M., & Snyder, J. M. (2003). Why Is There So Little Money in U.S. Politics? Journal of Economic Perspectives, 17(1), 105-130. doi:10.1257/089533003321164976
  19. [19]Efron, B. & Morris, C. (1975). Data Analysis Using Stein's Estimator and Its Generalizations. Journal of the American Statistical Association, 70(350), 311-319. doi:10.2307/2285814
  20. [20]Poole, K. T. & Rosenthal, H. (1985). A Spatial Model for Legislative Roll Call Analysis. American Journal of Political Science, 29(2), 357-384. doi:10.2307/2111172
  21. [21]Clinton, J., Jackman, S., & Rivers, D. (2004). The Statistical Analysis of Roll Call Data. American Political Science Review, 98(2), 355-370. doi:10.1017/S0003055404001194
  22. [22]Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press. Ch. 14: Vector Space Classification.
  23. [23]Laver, M. & Garry, J. (2000). Estimating Policy Positions from Political Texts. American Journal of Political Science, 44(3), 619-634. doi:10.2307/2669268
  24. [24]Snyder, J. M. & Groseclose, T. (2000). Estimating Party Influence in Congressional Roll-Call Voting. American Journal of Political Science, 44(2), 193-211. doi:10.2307/2669305
  25. [25]Yarowsky, D. (1995). Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. Proceedings of ACL 1995, 189-196. doi:10.3115/981658.981684
  26. [26]Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Proceedings of NeurIPS 2020. arXiv:2005.11401
  27. [27]Grimmer, J. & Stewart, B. M. (2013). Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis, 21(3), 267-297. doi:10.1093/pan/mps028
  28. [28]Baumgartner, F. R. & Jones, B. D. (1993). Agendas and Instability in American Politics. University of Chicago Press.
  29. [29]Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A Neural Probabilistic Language Model. Journal of Machine Learning Research, 3, 1137-1155.
  30. [30]Yin, W., Hay, J., & Roth, D. (2019). Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach. Proceedings of EMNLP 2019, 3914-3923. doi:10.18653/v1/D19-1404
  31. [31]Laver, M., Benoit, K., & Garry, J. (2003). Extracting Policy Positions from Political Texts Using Words as Data. American Political Science Review, 97(2), 311-331. doi:10.1017/S0003055403000698
  32. [32]Brin, S. & Page, L. (1998). The Anatomy of a Large-Scale Hypertextual Web Search Engine. Proceedings of the 7th International World Wide Web Conference, 107-117.
  33. [33]Tauberer, J. (2012). Open Government Data: The Book. GovTrack.us methodology for ideology and leadership scoring via cosponsorship analysis. govtrack.us/about/analysis

Questions about our methodology? Disagree with a score? We welcome scrutiny. This project is built on the belief that transparency is non-negotiable.