40
of the IT/storage requirements of some of the projects with which we are involved.
The data captured as part of the pilot projects was of such volume that transferring it between EMBL-EBI and NCBI in the USA occupied the maximum limits of our internet capacity and took several days.
SNS: Please give some idea of the IT/storage infrastructure that underpins the research projects - ie is there a basic infrastructure, to which is added the necessary processing power and storage depending on the research project being provisioned?
PJ: A general overview of our infrastructure is; Linux server farms (for computational needs); large amounts of NFS based storage solutions and database infrastructure using LINUX and SAN storage
SNS: Are you dealing with a single site when it comes to users and IT/storage resources, or coping with a disparate user base and a grid computing infrastructure?
PJ: The EBI’s IT and storage resources are used by
EBI
staff and the global scientific community. Servers are located at four geographically dispersed Data Centres.
SNS: Presumably storage requirement has grown into the Petabyte field - making some kind of a tiered storage approach crucial to optimise both CAPEX and OPEX?
SUMMER 10 WWW.SNSEUROPE.COM
PJ: The EBI data has doubled every year and is now in the region of approx 10 Petabytes. We use fibre channel and SAS disk systems for performance critical tasks and high density SATA systems for the rest.
SNS: Can you give some idea as to how the IT/storage infrastructure has changed over the years - ie the processing power and storage capacity requirements - ie a move from high performance computing into the supercomputing bracket, perhaps?
PJ: We have moved from propriety UNIX servers to commodity based Linux server / farms. year 96 97 98 99 0 1
cpu cores 50 60 75
2 3 4 5 6 7 8 9
100 140 220 420 500 700 860
disks TB 0.2 0.5 2 3 5
10 20 30 55
100
2675 238 3777 500 5500 2500 9300 6000
Our storage technology has developed from local / directly attached storage NFS gateways to: NFS gateways & cache’s, High performance NFS High-performance parallel file systems
Scale-out NAS NFS storage solutions
SNS: Also, have the research compute demands driven
development of IT or IT developments allowed more sophisticated research...which drives which?
PJ: An ever increasing demand for more processing and storage capability from the EBI researchers, has driven us to find IT solutions to enable them to work / research to their maximum capacity.
SNS: Does the advent of technologies such as virtualisation, deduplication (or are the files all image-based?) and The Cloud offer any hope for the future re easing the data storage burden?
PJ: We have always deployed the leading edge solutions due to our massive growth.
SNS: Could you describe the IT/storage demands of a current specific project?
PJ: EBI projects are by and large using centrally shared resources (compute farms, database platforms & general purpose storage). This is an effective way of using our resources, rather than giving each project specific areas of storage / compute usage.
SNS: How do you backup petabytes of data in a timely fashion?
PJ: Our backup strategy is based on replicated disk based storage. We copy and synchronize every night to a remote Data Centre. We also take snapshots of the data and retain these for a week. In addition, our secondary
PETE JOKINEN
1-2-1
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32 |
Page 33 |
Page 34 |
Page 35 |
Page 36 |
Page 37 |
Page 38 |
Page 39 |
Page 40 |
Page 41 |
Page 42 |
Page 43 |
Page 44