Building a Multi-Source RSS Aggregator: Goodreads + Serverless Lambda

September 8, 2025•5 min read•by Zach Liibbe

How I built a resilient RSS aggregator using serverless Lambda functions to parse Goodreads feeds, handle XML inconsistencies, and provide reliable book data for my personal website.

#rss #lambda #serverless #goodreads #xml-parsing #aws

Building a Multi-Source RSS Aggregator: Goodreads + Serverless Lambda

When I decided to display my reading activity on my personal website, I quickly discovered that working with RSS feeds in 2025 isn't as straightforward as it might seem. Goodreads provides RSS feeds, but they're inconsistent, sometimes malformed, and definitely not designed for modern web applications.

Here's how I built a robust RSS aggregator using serverless Lambda functions that handles real-world XML parsing challenges and provides clean, reliable data for my website.

The Challenge: RSS Feeds Are Messy

RSS feeds, especially from platforms like Goodreads, come with several challenges:

Inconsistent Structure: Fields appear and disappear between entries
HTML Entities: Text is often encoded with &, <, etc.
Variable Image URLs: Cover images might be in book_large_image_url, book_medium_image_url, or embedded in descriptions
Rate Limiting: Direct browser requests get blocked
Performance: Parsing XML in the browser is slow and unreliable

The Solution: Serverless Lambda Aggregator

I created a serverless function deployed on AWS that acts as a middleware layer between Goodreads and my website. Here's the architecture:

typescript // Core Lambda handler structure export const getCurrentlyReading = async ( event: APIGatewayProxyEvent ): Promise<APIGatewayProxyResult> => { const queryParams = event.queryStringParameters || {}; const limit = parseInt(queryParams.limit || '5', 10); const shelf = 'currently-reading';



<p><p>// Check cache first<br>  const cacheKey = <code>${shelf}_${limit}</code>;<br>  const now = Date.now();</p></p>

<p><p>if (<br>    cachedData[cacheKey] &&<br>    now - (lastFetch[cacheKey] || 0) < CACHE_DURATION<br>  ) {<br>    return cachedResponse(cachedData[cacheKey]);<br>  }</p></p>

// Fetch and process RSS feed... }; <code></code>

Key Architecture Decisions

1. In-Memory Caching

typescript // Simple but effective caching let cachedData: { [key: string]: any } = {}; let lastFetch: { [key: string]: number } = {}; const CACHE_DURATION = 1000 60 5; // 5 minutes

2. Resilient XML Parsing

typescript function decodeHtmlEntities(text: string): string { return text .replace(/&/g, '&') .replace(/</g, '<') .replace(/>/g, '>') .replace(/"/g, '"') .replace(/'/g, "'") .replace(/'/g, "'"); }

3. Flexible Image Extraction

typescript // Handle multiple possible image URL formats let coverImg = null; if (item.book_large_image_url) { coverImg = item.book_large_image_url; } else if (item.book_medium_image_url) { coverImg = item.book_medium_image_url; } else if (item.book_small_image_url) { coverImg = item.book_small_image_url; } else if (item.description) { // Extract from HTML description as fallback const imgMatch = item.description.match(/?src="'["']/i); if (imgMatch && imgMatch[1]) { coverImg = imgMatch[1]; } }

Real-World XML Parsing Challenges

Challenge 1: Array vs Single Item Inconsistency

RSS feeds return arrays when there are multiple items, but single objects when there's only one item. This breaks everything:

typescript // Ensure items is always an array const items = Array.isArray(result.rss.channel.item) ? result.rss.channel.item : [result.rss.channel.item];

Challenge 2: HTML Instead of XML

Sometimes Goodreads returns HTML error pages instead of XML:

typescript // Detect HTML responses if ( xml.trim().startsWith('') || xml.trim().startsWith(') { console.error('Received HTML instead of XML from Goodreads'); throw new Error( 'Goodreads returned HTML instead of RSS feed - they may be blocking automated requests' ); }

Challenge 3: Missing or Malformed Data

Real RSS feeds have missing fields, null values, and unexpected structures:

typescript const books = limitedItems.map((item: any) => { const title = item.title ? decodeHtmlEntities(item.title) : 'Unknown Title'; const author = item.author_name ? decodeHtmlEntities(item.author_name) : 'Unknown Author';



<p><p>// Safely parse rating with fallback<br>  const rating = item.user_rating ? parseFloat(item.user_rating) : 0;<br>  const dateRead = item.user_read_at || null;</p></p>

return { title, author, coverImg, link: bookUrl, rating, dateRead, }; }); <code></code>

Client-Side Integration

On the frontend, I consume this Lambda through a simple API call:

typescript // Next.js API route that calls the Lambda export async function GET(request: NextRequest) { const { searchParams } = new URL(request.url); const shelf = searchParams.get('shelf') || 'read';



<p><p>try {<br>    const lambdaUrl = <code>https://goodreads-lambda.netlify.app/.netlify/functions/goodreads-lambda?shelf=${shelf}</code>;</p></p>

<p><p>const response = await fetch(lambdaUrl, {<br>      headers: {<br>        'User-Agent': 'Mozilla/5.0 (compatible; GoodreadsApp/1.0)',<br>      },<br>    });</p></p>

const data = await response.json(); return NextResponse.json(data); } catch (error) { // Graceful fallback to cached data return NextResponse.json({ error: 'Service temporarily unavailable' }); } } <code></code>

Performance and Reliability

Caching Strategy

The Lambda implements a three-tier caching approach:

In-Memory Cache: 5-minute cache within the Lambda function

HTTP Headers: Cache-Control: public, max-age=300

Client-Side Cache: Additional caching in the Next.js application

Error Handling

typescript try { // RSS processing logic } catch (error) { console.error('Lambda error:', error); return { statusCode: 500, headers: corsHeaders, body: JSON.stringify({ error: error instanceof Error ? error.message : 'Unknown error', status: 'error', timestamp: new Date().toISOString(), }), }; }

CORS Configuration

typescript const headers = { 'Access-Control-Allow-Origin': '', 'Access-Control-Allow-Methods': 'GET, OPTIONS', 'Access-Control-Allow-Headers': 'Content-Type', 'Cache-Control': 'public, max-age=300', };

Deployment and Monitoring

The Lambda is deployed using Netlify Functions with a simple netlify.toml:

toml [build] functions = "functions"
[functions] node_bundler = "esbuild"

[[redirects]] from = "/api/" to = "/.netlify/functions/:splat" status = 200 <code></code>

Results and Lessons Learned

This serverless RSS aggregator now reliably serves book data to my website with:

99.9% uptime through Netlify's infrastructure

Sub-200ms response times with effective caching

Graceful degradation when Goodreads is unavailable

Clean, consistent data regardless of RSS feed quirks

Key Takeaways

RSS feeds require defensive programming - assume nothing about structure

Caching is essential - both for performance and reliability

Serverless is perfect for this use case - handles traffic spikes and reduces costs

Error boundaries matter - graceful degradation keeps your site working

HTML entity decoding is crucial - don't forget this step

What's Next?

I'm planning to extend this system to:

Support additional book platforms (StoryGraph, Amazon)

Add reading progress tracking

Implement webhook-based cache invalidation

Add metrics and monitoring dashboards

The complete source code for this RSS aggregator is available in my GitHub repository, and you can see it in action on my personal website.

_Want to see more technical deep-dives like this? Follow my journey as I build in public and share what I learn along the way._