← Back to Blog

Building a Multi-Source RSS Aggregator: Goodreads + Serverless Lambda

September 8, 20255 min readby Zach Liibbe

How I built a resilient RSS aggregator using serverless Lambda functions to parse Goodreads feeds, handle XML inconsistencies, and provide reliable book data for my personal website.

Building a Multi-Source RSS Aggregator: Goodreads + Serverless Lambda

When I decided to display my reading activity on my personal website, I quickly discovered that working with RSS feeds in 2025 isn't as straightforward as it might seem. Goodreads provides RSS feeds, but they're inconsistent, sometimes malformed, and definitely not designed for modern web applications.

Here's how I built a robust RSS aggregator using serverless Lambda functions that handles real-world XML parsing challenges and provides clean, reliable data for my website.

The Challenge: RSS Feeds Are Messy

RSS feeds, especially from platforms like Goodreads, come with several challenges:

          • Inconsistent Structure: Fields appear and disappear between entries
          • HTML Entities: Text is often encoded with &, <, etc.
          • Variable Image URLs: Cover images might be in book_large_image_url, book_medium_image_url, or embedded in descriptions
          • Rate Limiting: Direct browser requests get blocked
          • Performance: Parsing XML in the browser is slow and unreliable
          • The Solution: Serverless Lambda Aggregator

            I created a serverless function deployed on AWS that acts as a middleware layer between Goodreads and my website. Here's the architecture:

            typescript<br>// Core Lambda handler structure<br>export const getCurrentlyReading = async (<br> event: APIGatewayProxyEvent<br>): Promise<APIGatewayProxyResult> => {<br> const queryParams = event.queryStringParameters || {};<br> const limit = parseInt(queryParams.limit || '5', 10);<br> const shelf = 'currently-reading';</p></p>

            <p><p>// Check cache first<br> const cacheKey = <code>${shelf}_${limit}</code>;<br> const now = Date.now();</p></p>

            <p><p>if (<br> cachedData[cacheKey] &&<br> now - (lastFetch[cacheKey] || 0) < CACHE_DURATION<br> ) {<br> return cachedResponse(cachedData[cacheKey]);<br> }</p></p>

            <p><p>// Fetch and process RSS feed...<br>};<br><code></code>

            Key Architecture Decisions

            1. In-Memory Caching

            typescript
            // Simple but effective caching
            let cachedData: { [key: string]: any } = {};
            let lastFetch: { [key: string]: number } = {};
            const CACHE_DURATION = 1000 60 5; // 5 minutes

            2. Resilient XML Parsing

            typescript
            function decodeHtmlEntities(text: string): string {
            return text
            .replace(/&/g, '&')
            .replace(/</g, '<')
            .replace(/>/g, '>')
            .replace(/"/g, '"')
            .replace(/'/g, "'")
            .replace(/'/g, "'");
            }

            3. Flexible Image Extraction

            typescript
            // Handle multiple possible image URL formats
            let coverImg = null;
            if (item.book_large_image_url) {
            coverImg = item.book_large_image_url;
            } else if (item.book_medium_image_url) {
            coverImg = item.book_medium_image_url;
            } else if (item.book_small_image_url) {
            coverImg = item.book_small_image_url;
            } else if (item.description) {
            // Extract from HTML description as fallback
            const imgMatch = item.description.match(/?src="'["']/i);
            if (imgMatch && imgMatch[1]) {
            coverImg = imgMatch[1];
            }
            }

            Real-World XML Parsing Challenges

            Challenge 1: Array vs Single Item Inconsistency

            RSS feeds return arrays when there are multiple items, but single objects when there's only one item. This breaks everything:

            typescript
            // Ensure items is always an array
            const items = Array.isArray(result.rss.channel.item)
            ? result.rss.channel.item
            : [result.rss.channel.item];

            Challenge 2: HTML Instead of XML

            Sometimes Goodreads returns HTML error pages instead of XML:

            typescript
            // Detect HTML responses
            if (
            xml.trim().startsWith('') ||
            xml.trim().startsWith(') {
            console.error('Received HTML instead of XML from Goodreads');
            throw new Error(
            'Goodreads returned HTML instead of RSS feed - they may be blocking automated requests'
            );
            }

            Challenge 3: Missing or Malformed Data

            Real RSS feeds have missing fields, null values, and unexpected structures:

            typescript<br>const books = limitedItems.map((item: any) => {<br> const title = item.title ? decodeHtmlEntities(item.title) : 'Unknown Title';<br> const author = item.author_name<br> ? decodeHtmlEntities(item.author_name)<br> : 'Unknown Author';</p></p>

            <p><p>// Safely parse rating with fallback<br> const rating = item.user_rating ? parseFloat(item.user_rating) : 0;<br> const dateRead = item.user_read_at || null;</p></p>

            <p><p>return {<br> title,<br> author,<br> coverImg,<br> link: bookUrl,<br> rating,<br> dateRead,<br> };<br>});<br><code></code>

            Client-Side Integration

            On the frontend, I consume this Lambda through a simple API call:

            typescript<br>// Next.js API route that calls the Lambda<br>export async function GET(request: NextRequest) {<br> const { searchParams } = new URL(request.url);<br> const shelf = searchParams.get('shelf') || 'read';</p></p>

            <p><p>try {<br> const lambdaUrl = <code>https://goodreads-lambda.netlify.app/.netlify/functions/goodreads-lambda?shelf=${shelf}</code>;</p></p>

            <p><p>const response = await fetch(lambdaUrl, {<br> headers: {<br> 'User-Agent': 'Mozilla/5.0 (compatible; GoodreadsApp/1.0)',<br> },<br> });</p></p>

            <p><p>const data = await response.json();<br> return NextResponse.json(data);<br> } catch (error) {<br> // Graceful fallback to cached data<br> return NextResponse.json({ error: 'Service temporarily unavailable' });<br> }<br>}<br><code></code>

            Performance and Reliability

            Caching Strategy

            The Lambda implements a three-tier caching approach:

          • In-Memory Cache: 5-minute cache within the Lambda function

          • HTTP Headers: Cache-Control: public, max-age=300

          • Client-Side Cache: Additional caching in the Next.js application
          • Error Handling

            typescript
            try {
            // RSS processing logic
            } catch (error) {
            console.error('Lambda error:', error);
            return {
            statusCode: 500,
            headers: corsHeaders,
            body: JSON.stringify({
            error: error instanceof Error ? error.message : 'Unknown error',
            status: 'error',
            timestamp: new Date().toISOString(),
            }),
            };
            }

            CORS Configuration

            typescript
            const headers = {
            'Access-Control-Allow-Origin': '',
            'Access-Control-Allow-Methods': 'GET, OPTIONS',
            'Access-Control-Allow-Headers': 'Content-Type',
            'Cache-Control': 'public, max-age=300',
            };

            Deployment and Monitoring

            The Lambda is deployed using Netlify Functions with a simple netlify.toml:

            toml<br>[build]<br> functions = "functions"</p></p>

            <p><p>[functions]<br> node_bundler = "esbuild"</p></p>

            <p><p>[[redirects]]<br> from = "/api/"<br> to = "/.netlify/functions/:splat"<br> status = 200<br><code></code>

            Results and Lessons Learned

            This serverless RSS aggregator now reliably serves book data to my website with:

          • 99.9% uptime through Netlify's infrastructure

          • Sub-200ms response times with effective caching

          • Graceful degradation when Goodreads is unavailable

          • Clean, consistent data regardless of RSS feed quirks
          • Key Takeaways

          • RSS feeds require defensive programming - assume nothing about structure

          • Caching is essential - both for performance and reliability

          • Serverless is perfect for this use case - handles traffic spikes and reduces costs

          • Error boundaries matter - graceful degradation keeps your site working

          • HTML entity decoding is crucial - don't forget this step
          • What's Next?

            I'm planning to extend this system to:

          • Support additional book platforms (StoryGraph, Amazon)

          • Add reading progress tracking

          • Implement webhook-based cache invalidation

          • Add metrics and monitoring dashboards

The complete source code for this RSS aggregator is available in my GitHub repository, and you can see it in action on my personal website.


_Want to see more technical deep-dives like this? Follow my journey as I build in public and share what I learn along the way._

Found this helpful? Share it with others: