• 0 Posts
  • 2 Comments
Joined 2 months ago
cake
Cake day: September 27th, 2024

help-circle
  • Comments are also easy, the API allows pulling them by latest too. If I was writing a search engine, I would probably just track all known instances and just pull local content from them instead of deduplicating. I haven’t really looked at how votes are federated though, so that might be more complicated to keep updated.

    I expect just syncing posts and comments from all instances to be mostly easy. In the past I was able to pull all posts and comments from smaller instances in like less than 10 minutes. It’s mostly just text so it doesn’t take that long. After it’s pulled, it can be kept mostly up to date by just pulling to the last date received, and should take much less time than the first sync.

    I’ve noticed there’s lots of stuff on Lemmy that fails to federate to other instances. I think there’s also actually a 3000€ reward at the moment for improving federation, so if you spend very much time on it, it might be a good idea to see if it can be claimed. Though, I don’t really know how the milestone system works, and it might only be available to inside contributors.


  • It’s also possible to just pull all posts from an instance. The API is easy to understand.

    main.rs
    use serde::{Deserialize, Serialize};
    use std::collections::HashMap;
    
    fn main() {
        let rt = tokio::runtime::Builder::new_current_thread()
            .enable_all()
            .build()
            .unwrap();
    
        let to_page = Some(5);
        let posts = rt.block_on(get_posts_to(to_page)).unwrap();
    
        println!("-----------");
        println!(
            "All posts to page {} as JSON:",
            to_page.map(|v| v.to_string()).unwrap_or("infinity".into())
        );
        println!("-----------");
        println!("{}", serde_json::to_string(&posts).unwrap());
    }
    
    #[derive(Serialize, Deserialize, Debug, Clone)]
    struct PostData {
        id: usize,
        name: String,
    }
    
    #[derive(Serialize, Deserialize, Debug)]
    struct PageItem {
        post: PostData,
    }
    
    #[derive(Serialize, Deserialize, Debug)]
    struct PostPageResult {
        posts: Vec<PageItem>,
    }
    
    async fn get_page(index: usize) -> Result<HashMap<usize, PostData>, ()> {
        let result = reqwest::get(format!(
            "https://programming.dev/api/v3/post/list?dataType=Post&listingType=All&sort=New&page={}",
            index
        ))
        .await;
    
        if let Ok(r) = result {
            if let Ok(text) = r.text().await {
                if let Ok(data) = serde_json::from_str(&text) {
                    let data: PostPageResult = data;
    
                    let map =
                        data.posts
                            .iter()
                            .fold(HashMap::new(), |mut map, post| {
                                map.insert(post.post.id, post.post.clone());
    
                                map
                            });
    
                    if map.len() > 0 {
                        return Ok(map);
                    }
                } else {
                    println!("{:?}", serde_json::from_str::<PostPageResult>(&text));
                }
            }
        }
    
        Err(())
    }
    
    /// If page is not `None` then it stops after the page count. Otherwise it continues forever
    async fn get_posts_to(page: Option<usize>) -> Result<HashMap<usize, PostData>, ()> {
        let mut idx = 1;
        let mut map = HashMap::new();
    
        while let Ok(more_posts) = get_page(idx).await {
            println!("page: {}, {:#?}", idx, more_posts);
            map.extend(more_posts.into_iter());
            idx += 1;
    
            if let Some(page) = page {
                if idx > page {
                    break;
                }
            }
        }
    
        Ok(map)
    }
    
    
    Cargo.toml
    [package]
    name = "lemmyposts"
    version = "0.1.0"
    edition = "2021"
    
    [dependencies]
    reqwest = "0.12.7"
    serde = { version = "1.0", features = ["derive"] }
    serde_json = "1.0.128"
    tokio = { version = "1.4", features = ["rt"] }