Building a Multi-Source RSS Aggregator: Goodreads + Serverless Lambda
When I decided to display my reading activity on my personal website, I quickly discovered that working with RSS feeds in 2025 isn't as straightforward as it might seem. Goodreads provides RSS feeds, but they're inconsistent, sometimes malformed, and definitely not designed for modern web applications.
Here's how I built a robust RSS aggregator using serverless Lambda functions that handles real-world XML parsing challenges and provides clean, reliable data for my website.
The Challenge: RSS Feeds Are Messy
RSS feeds, especially from platforms like Goodreads, come with several challenges:
- Inconsistent Structure: Fields appear and disappear between entries
- HTML Entities: Text is often encoded with
&,<, etc. - Variable Image URLs: Cover images might be in
book_large_image_url,book_medium_image_url, or embedded in descriptions - Rate Limiting: Direct browser requests get blocked
- Performance: Parsing XML in the browser is slow and unreliable
- In-Memory Cache: 5-minute cache within the Lambda function
- HTTP Headers:
Cache-Control: public, max-age=300 - Client-Side Cache: Additional caching in the Next.js application
- 99.9% uptime through Netlify's infrastructure
- Sub-200ms response times with effective caching
- Graceful degradation when Goodreads is unavailable
- Clean, consistent data regardless of RSS feed quirks
- RSS feeds require defensive programming - assume nothing about structure
- Caching is essential - both for performance and reliability
- Serverless is perfect for this use case - handles traffic spikes and reduces costs
- Error boundaries matter - graceful degradation keeps your site working
- HTML entity decoding is crucial - don't forget this step
- Support additional book platforms (StoryGraph, Amazon)
- Add reading progress tracking
- Implement webhook-based cache invalidation
- Add metrics and monitoring dashboards
The Solution: Serverless Lambda Aggregator
I created a serverless function deployed on AWS that acts as a middleware layer between Goodreads and my website. Here's the architecture:
typescript<br>// Core Lambda handler structure<br>export const getCurrentlyReading = async (<br> event: APIGatewayProxyEvent<br>): Promise<APIGatewayProxyResult> => {<br> const queryParams = event.queryStringParameters || {};<br> const limit = parseInt(queryParams.limit || '5', 10);<br> const shelf = 'currently-reading';</p></p>
<p><p>// Check cache first<br> const cacheKey = <code>${shelf}_${limit}</code>;<br> const now = Date.now();</p></p>
<p><p>if (<br> cachedData[cacheKey] &&<br> now - (lastFetch[cacheKey] || 0) < CACHE_DURATION<br> ) {<br> return cachedResponse(cachedData[cacheKey]);<br> }</p></p>
<p><p>// Fetch and process RSS feed...<br>};<br><code></code>
Key Architecture Decisions
1. In-Memory Caching
typescript
// Simple but effective caching
let cachedData: { [key: string]: any } = {};
let lastFetch: { [key: string]: number } = {};
const CACHE_DURATION = 1000 60 5; // 5 minutes
2. Resilient XML Parsing
typescript
function decodeHtmlEntities(text: string): string {
return text
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, "'")
.replace(/'/g, "'");
}
3. Flexible Image Extraction
typescript
// Handle multiple possible image URL formats
let coverImg = null;
if (item.book_large_image_url) {
coverImg = item.book_large_image_url;
} else if (item.book_medium_image_url) {
coverImg = item.book_medium_image_url;
} else if (item.book_small_image_url) {
coverImg = item.book_small_image_url;
} else if (item.description) {
// Extract from HTML description as fallback
const imgMatch = item.description.match(/
if (imgMatch && imgMatch[1]) {
coverImg = imgMatch[1];
}
}
Real-World XML Parsing Challenges
Challenge 1: Array vs Single Item Inconsistency
RSS feeds return arrays when there are multiple items, but single objects when there's only one item. This breaks everything:
typescript
// Ensure items is always an array
const items = Array.isArray(result.rss.channel.item)
? result.rss.channel.item
: [result.rss.channel.item];
Challenge 2: HTML Instead of XML
Sometimes Goodreads returns HTML error pages instead of XML:
typescript
// Detect HTML responses
if (
xml.trim().startsWith('') ||
xml.trim().startsWith(') {
console.error('Received HTML instead of XML from Goodreads');
throw new Error(
'Goodreads returned HTML instead of RSS feed - they may be blocking automated requests'
);
}
Challenge 3: Missing or Malformed Data
Real RSS feeds have missing fields, null values, and unexpected structures:
typescript<br>const books = limitedItems.map((item: any) => {<br> const title = item.title ? decodeHtmlEntities(item.title) : 'Unknown Title';<br> const author = item.author_name<br> ? decodeHtmlEntities(item.author_name)<br> : 'Unknown Author';</p></p>
<p><p>// Safely parse rating with fallback<br> const rating = item.user_rating ? parseFloat(item.user_rating) : 0;<br> const dateRead = item.user_read_at || null;</p></p>
<p><p>return {<br> title,<br> author,<br> coverImg,<br> link: bookUrl,<br> rating,<br> dateRead,<br> };<br>});<br><code></code>
Client-Side Integration
On the frontend, I consume this Lambda through a simple API call:
typescript<br>// Next.js API route that calls the Lambda<br>export async function GET(request: NextRequest) {<br> const { searchParams } = new URL(request.url);<br> const shelf = searchParams.get('shelf') || 'read';</p></p>
<p><p>try {<br> const lambdaUrl = <code>https://goodreads-lambda.netlify.app/.netlify/functions/goodreads-lambda?shelf=${shelf}</code>;</p></p>
<p><p>const response = await fetch(lambdaUrl, {<br> headers: {<br> 'User-Agent': 'Mozilla/5.0 (compatible; GoodreadsApp/1.0)',<br> },<br> });</p></p>
<p><p>const data = await response.json();<br> return NextResponse.json(data);<br> } catch (error) {<br> // Graceful fallback to cached data<br> return NextResponse.json({ error: 'Service temporarily unavailable' });<br> }<br>}<br><code></code>
Performance and Reliability
Caching Strategy
The Lambda implements a three-tier caching approach:
Error Handling
typescript
try {
// RSS processing logic
} catch (error) {
console.error('Lambda error:', error);
return {
statusCode: 500,
headers: corsHeaders,
body: JSON.stringify({
error: error instanceof Error ? error.message : 'Unknown error',
status: 'error',
timestamp: new Date().toISOString(),
}),
};
}
CORS Configuration
typescript
const headers = {
'Access-Control-Allow-Origin': '',
'Access-Control-Allow-Methods': 'GET, OPTIONS',
'Access-Control-Allow-Headers': 'Content-Type',
'Cache-Control': 'public, max-age=300',
};
Deployment and Monitoring
The Lambda is deployed using Netlify Functions with a simple netlify.toml:
toml<br>[build]<br> functions = "functions"</p></p>
<p><p>[functions]<br> node_bundler = "esbuild"</p></p>
<p><p>[[redirects]]<br> from = "/api/"<br> to = "/.netlify/functions/:splat"<br> status = 200<br><code></code>
Results and Lessons Learned
This serverless RSS aggregator now reliably serves book data to my website with:
Key Takeaways
What's Next?
I'm planning to extend this system to:
The complete source code for this RSS aggregator is available in my GitHub repository, and you can see it in action on my personal website.
_Want to see more technical deep-dives like this? Follow my journey as I build in public and share what I learn along the way._