Parse.ly | Python Data Engineers (Americas) and Machine Learning Engineers (Europe, Middle East, Africa) | Remote | Full-Time | https://parse.ly
Are you a Python programmer based in North or South America, interested in large-scale data processing (terabytes per month, petabytes in our archive), and making use of massively-parallel computing architectures, such as those behind Spark and Dask? Or, are you a Machine Learning Engineer in Europe or the UTC to UTC+3 timezones, interested in making use of modern ML techniques atop an open source Python stack? If so, then you should apply for our fully distributed team, since we are hiring for both roles.
Python Data Engineers[1] write code that runs on hundreds of cloud nodes, using best-in-class distributed database technologies like Kafka, Cassandra, and Elasticsearch. Machine Learning Engineers[2] use cutting-edge techniques to move our analytics, recommendation, and natural language processing stack forward; they have built working production systems using word embeddings, topic clustering, and deep learning. Together, members of these two teams power a massive time series analytics engine and content crawl database that offers an elegant user experience to hundreds of enterprise customers. And we do all of this on a team that's small enough to be nimble, but large enough to be dangerous. 15 total engineers, growing to 25 by the end of 2021.
Join us to build the world's best content analytics system. Apply at work@parsely.com with a couple paragraphs describing why you're interested, and mark your subject line as either"Python Data Engineer via HN" role (if in North America or South America) or "Machine Learning Engineer via HN" role (if in Europe or UTC to UTC+3 timezones). A link to any portfolio/code you think is relevant, and/or your CV.
You'd be entering our hiring process early, as we only just kicked off our hiring wave in June, 2021, after Parse.ly's recent acquisition by Automattic[3] -- one of the world's largest fully distributed teams, and one of the biggest champions of open source and open web technologies.
Return On Art | Vienna, Austria | Front End Engineer | Full-time | ONSITE | https://returnonart.com/
Return on Art is an online art platform experiencing explosive sales growth, prompting us to hire a dedicated front-end developer.
Create and maintain a beautiful mobile-focused eCommerce website for art patrons around the world! To be a good fit for this role, you should be intrigued by the opportunity to make key decisions for our front-end technology stack and to own the responsibility to build slick user experiences.
What you'll do
- Write JavaScript code using the best practices
- Come up with novel designs to create a polished art discovery and purchase experience
- Creatively prototype and design new features
- Utilize analytics data and A/B tests to discover and overcome customer pain points.
How to get an interview
While work experience is good, we ultimately value results over experience. We ask that you point toward previous work that showcases your frontend and user interface skills. This could be presented as a personal or professional project that's running live or a code repository with description of what it does, screenshots of past products you have worked on and how they looked like in a shipped state with some explanation of what parts you worked on, and how it was built or any other type of portfolio or previous work that you can walk us through.
Please send your application and pointers to previous work to careers@returnonart.com
I'm one of the authors of the article.
Quick answer: The data shows that if you're a journalist about to write your next article, you're likely to get more views if you write it about Clinton rather than Trump. It's true that Trump got more pageviews overall, but that seems to be mostly because way more articles were written about him in the first place.
Long Answer: In the article we suggest that if publishers would have written more articles about Clinton, they would have received more page views, because in the data we observe posts on Clinton receive more page views on average. Similarly, we suggest writing more articles on Bernie Sanders would have caused an increase in referrals from social and search. As with all non-experimental approaches to causal inference, valid conclusions require strong assumptions. In the case of this analysis, we assume that the average number of page views that articles on a candidate receives is independent of the number of articles written on that candidate. If it were the case that writing more articles on Clinton, and fewer on Trump, would have caused Clinton articles to receive fewer views, and Trump articles to receive more, then our conclusions might be wrong.
Trump is the head, and the other candidates are the long tail.
You said: "It's true that Trump got more pageviews overall, but that seems to be mostly because way more articles were written about him in the first place."
And: "we suggest that if publishers would have written more articles about Clinton, they would have received more page views, because in the data we observe posts on Clinton receive more page views on average"
This seems to be the wrong conclusion because of diminishing returns. Writing more articles about Clinton should still push down the average page views. There is only so much interest, and only so much new to write about every day. None of the candidates can create fresh new controversies to feed the media the way Trump can. The question is how much would that push it down? I don't think it would be inaccurate to suggest, based on that these sites exist in a market, that it would push it down significantly below Trump.
I don't believe a base that strong exists per article, where any article is guaranteed to get some absolute number of page views. If diminishing returns aren't present or are extremely weak, then I'm wrong.
If anything, the data doesn't rule out that sites/reporters are correctly maximizing Trump coverage. Or they may not be maximizing enough since absolute demand is so high and Trump generates so much fresh content. If you can write about one easy topic, and maintain an average that high with only a small decrease in the average, you are doing more with less.
In the comment above, I try to clearly lay out that my conclusion rests on the assumption that for a given candidate, avg views per article and number of articles are independent.
I agree that if you reject this assumption, and instead assume that there are 'diminishing returns', then the conclusion I arrived at could be wrong.
There probably is some kind of diminishing return effect, but we don't know how strong it is. It could be weak compared the the effect that 'readers will consume whatever journalists write'. It's pretty interesting the all of the last four leading candidates (Trump, Cruz, Clinton, and Sanders) all had roughly comparable numbers for pageviews per post. That's evidence that readers just pretty much read whatever is published (with the exception of Kasich, a long shot).
It's also true that if you're a journalist right now, faced with the current distribution of articles, you're likely to get more page views by writing your next article on Clinton. This claim doesn't rest on any strong assumptions. That could change if many more articles on Clinton are written, but it's true for now.
If you look at the data in the dashboard, it's also interesting to see that Bernie Sanders gets way more social and search referrals compared to Clinton and Trump.
I think this assumption is dangerous to begin with. It should be proven with data that there are no diminishing returns, and that would be a powerful finding worthy of attention.
And saying that readers will consume whatever journalists write helps power the narrative that the media fueled Trump's campaign. They could write about a different candidate and get slightly improved pageviews, but they're choosing to flood with Trump articles. Your data would only conclude that pageviews aren't driving it.
I think expanding this to "readers will consume whatever journalists write" is a different argument and you would need to establish your "experiment" with a different methodology than the approach used here. The causation seems to be "reporters write news, it exists to be consumed on a site" therefore "readers read it" and that feels like it's missing something to me.
Also, it could be interesting they have comparable numbers per post, but it also backs up the idea that articles exist in response to the demand-supply feedback loop. If sites respond to pageviews, then candidates with lower average pageviews will simply not get as much media attention.
You bring up a good point. Journalists should explore to what extent there are diminishing returns to writing articles on other candidates. Statistics tells us that the most efficient way of exploring this hypothesis is with a multi-armed bandit algorithm. But before I go into that, I think it makes sense to break this problem down into two questions:
(1) Given equally interesting ideas for articles to write on each candidate, which candidate should a journalist write on?
(2) How much investment is required to write an interesting article on each candidate? It might require less work to write something interesting on Trump than on Clinton.
The right way of answering question (1) is with a dynamic multi-armed bandit algorithm. Such an algorithm dynamically explores the problem of diminishing returns. At this point, given the data we have, such an algorithm would suggest you should write on Clinton the vast majority of the time if you're interested in page views per article, and would suggest you write about Sanders if you're interested in bringing in external referrals from facebook and google. If journalists followed the advice of such an algorithm and wrote so many articles on Clinton that readers started to lose interest, then the algorithm would begin to suggest you write on someone else. If there's enough interest in this article, I might write up a follow-up where I fit a model that tells journalists what topic to write on, given that they it's just as easy to write an article on each topic. I could update this model every once in a while to make sure it detects those diminishing returns in time.
Question (2) is more difficult to answer and requires more domain knowledge. I would say it is possible at any moment to write hundreds of interesting articles on each candidate---the real question is how much work it takes. As I mention in the blog post, I am convinced that journalists find it easier to write interesting articles about Trump. So in some sense it's rational for them to do so: the 'return on investment' is higher because it's so cheap to churn out another article on Trump's latest soundbite. However, one could also argue that -- in the name of increased page views, or in the name of a functioning democracy -- they should make the extra effort to write an interesting article on the other candidates.
That's not true. If you're looking at the interactive chart, you need to check the box that says "Show page views per article". Otherwise you're just seeing raw pageviews, and Trump has more of those because there were so many more articles written on him.
I think that's what @nimblegorilla was suggesting, and I think it actually fits with what the article was saying - the ROI on writing an article on Trump is much better than for other candidates because it's easy to write yet another article, and still get loads of pageviews for it.
In addition, Trump has significantly pushed up the total number of pageviews going to election cycle articles, even if the articles specifically about him are not as popular as the articles about Sanders, Clinton, Cruz, etc.
Are you a Python programmer based in North or South America, interested in large-scale data processing (terabytes per month, petabytes in our archive), and making use of massively-parallel computing architectures, such as those behind Spark and Dask? Or, are you a Machine Learning Engineer in Europe or the UTC to UTC+3 timezones, interested in making use of modern ML techniques atop an open source Python stack? If so, then you should apply for our fully distributed team, since we are hiring for both roles.
Python Data Engineers[1] write code that runs on hundreds of cloud nodes, using best-in-class distributed database technologies like Kafka, Cassandra, and Elasticsearch. Machine Learning Engineers[2] use cutting-edge techniques to move our analytics, recommendation, and natural language processing stack forward; they have built working production systems using word embeddings, topic clustering, and deep learning. Together, members of these two teams power a massive time series analytics engine and content crawl database that offers an elegant user experience to hundreds of enterprise customers. And we do all of this on a team that's small enough to be nimble, but large enough to be dangerous. 15 total engineers, growing to 25 by the end of 2021.
Join us to build the world's best content analytics system. Apply at work@parsely.com with a couple paragraphs describing why you're interested, and mark your subject line as either"Python Data Engineer via HN" role (if in North America or South America) or "Machine Learning Engineer via HN" role (if in Europe or UTC to UTC+3 timezones). A link to any portfolio/code you think is relevant, and/or your CV.
You'd be entering our hiring process early, as we only just kicked off our hiring wave in June, 2021, after Parse.ly's recent acquisition by Automattic[3] -- one of the world's largest fully distributed teams, and one of the biggest champions of open source and open web technologies.
[1]: https://www.parse.ly/careers/python_data_engineer
[2]: https://www.parse.ly/careers/machine_learning_engineer
[3]: https://www.techmeme.com/210208/p12#a210208p12